etu 


.... 
ЕСЫ 


Mt 4 
» "212 


NOV >} 4 
- T. “4. Р, 


ч 


у 2 | AQ 
А Г MAY 
A.D NY, жем Bengal 
Data m ag EE nesae? 


ик тә. | 


D 


_ STATISTICS ІМ PSYCHOLOGY . 


AND EDUCATION :--^ 


"If we take in our hand any volume . . . let us 
ask, Does it contain any abstract reasoning concerning 
quantity or number? No. Does it contain any experi- 
mental reasoning concerning matter of fact and existence? 
No. Commit it then to the flames: for it can 
contain nothing but sophistry and illusion!” 

Hume, David, An Enquiry Concerning Human 
Understanding, (1777). 


STATISTICS IN 
PSYCHOLOGY . 
AND EDUCATION 


BY 4 
HENRY E. GARRETT, Pu.D. 


PROFESSOR OF PSYCHOLOGY, COLUMBIA UNIVERSITY 


WITH AN INTRODUCTION BY 
В. S. WOODWORTH 


PROFESSOR EMERITUS OF PSYCHOLOGY 
COLUMBIA UNIVERSITY 


THIRD EDITION 


LONGMANS, GREEN AND CO. 


NEW YORK - LONDON : TORONTO 


LONGMANS, GREEN AND CO., INC. 
$5 FIFTH AVENUE, NEW YORK 3 


LONGMANS, GREEN AND СО. Ltp. 
6 & 7 CLIFFORD STREET, LONDON WI 


LONGMANS, GREEN AND CO. 
215 VICTORIA STREET, TORONTO 1 


GARRETT 
STATISTICS IN PSYCHOLOGY AND EDUCATION 


COPYRIGHT * 1926, 1937, AND 1947 
BY LONGMANS, GREEN AND CO., INC. 


ALL RIGHTS RESERVED, INCLUDING THE RIGHT TO REPRODUCE 
THIS BOOK, OR ANY PORTION THEREOF, 1% ANY FORM 


FIRST EDITION, JANUARY 1926 
TEN PRINTINGS 


SECOND EDITION, REWRITTEN JUNE 1937 
EIGHT PRINTINGS 


THIRD EDITION, REWRITTEN JANUARY 1947 
JULY 1947, NOVEMBER 1947 
OCTOBER 1948, SEPTEMBER 1949 
JUNE 1950, MAY 1951 


Printed in the United States of America 


VAN REES PRESS • NEW YORK 


INTRODUCTION 


Moprnw problems and needs are forcing statistical methods 
and statistical ideas more and more to the fore. There are 50 
many things we wish to know which cannot be discovered by à 
single observation, or by a single measurement. We wish to 
envisage the behavior of а man who, like all men, is rather a 
variable quantity, and must be observed repeatedly and not 
once for all. We wish to study the social group, composed of 
individuals differing one from another. We should like to be 
able to compare one group with another, one race with another, 
as well as one individual with another individual, or the indi- 
vidual with the norm for his age, race or class. We wish to 
trace the curve which pietures the growth of a child, or of а 
population. We wish to disentangle the interwoven factors of 
heredity and environment which influence the development of 
the individual, and to measure the similarly interwoven effects 
of laws, social customs and economie conditions upon publie 
health, safety and welfare generally. Even if our statistical 
appetite is far from keen, we all of us should like to know enough 
to understand, or to withstand, the statistics that are constantly 
being thrown at us in print or conversation — much of it pretty 
bad statistics. The only cure for bad statistics is apparently 
more and better statistics. All in all, it certainly appears that 
the rudiments of sound statistical sense are coming to be an 
essential of a liberal education. 

Now there are different orders of statisticians. There is, 
first in order, the mathematician who invents the method for 
performing a certain type of statistical job. His interest, as & 
mathematician, is not in the educational, social or psychological 
problems just alluded to, but in the problem of devising instru- 
ments for handling such matters. He is the tool-maker of the 

т 


хі ( INTRODUCTION 


statistical industry, and one good tool-maker can supply many | 
skilled workers. The latter are quite another order of statisti- 
cians. Supply them with the mathematician's formulas, map. 
out the procedure for them to follow, provide working charts 
tables and calculating machines, and they will compute from 
your data the necessary averages, probable errors and соге 
tion coefficients. "Their interest, as computers, lies in the quic 
and accurate handling of the tools of the trade. But there я 
a statistician of yet another order, in between the other two 
His primary interest is psychological, perhaps, or it may W| 
educational. It is he who has selected the scientific or practi *| 
problem, who has organized his attack upon the problem № 
such fashion that the data obtained can be handled in som 
Sound statistical way. Не selects the statistical tools to р 
employed, and, when the computers have done their work, 1. 
scrutinizes the results for their bearing upon the scientifi 9 
practical problem with which he started. Such an one, ! В 
Short, must have a discriminating knowledge of the kit of toni 
which the mathematician has handed him, as well as some ВК! 
in their actual use. 


The reader of the present book will quickly discern that ү 
is intended primarily for statisticians of the last-mentione 1 
type. Itlays out before him the tools of the trade; it explain 
very fully and carefully the manner of handling each tool; ! 
affords practice in the use of each. While it has little to say © 
the tool-maker’s art, it takes great pains to make clear the us? 
and limitations of each tool. As any one can re 
has tried to teach statistics to the class of 
need to know the subject, this book is the p 
teacher's experience, 


adily see W^. 
Students who mo 
roduct of a genu? 
and is exceptionally well adapted to p. 
Student's use. То an unusual degree, it succeeds in meeti” 
the student upon his own ground. 


К. S. WoopwoRTÉ 
COLUMBIA UNIVERSITY 
(1926) 


PREFACE 
To Turd EDITION 


Ту this edition much of the text has been rewritten and various 
procedures brought up to date. Earlier chapters dealing with 
the frequency distribution have been changed the least, later 
chapters dealing with sampling and correlation have been 
changed the most. Several methods and formulas of limited 
application have been omitted in favor of more useful tech- 
niques. The new material includes small sample methods; a 
chapter (Chapter VIII) dealing with the testing of experimental 
hypotheses; a more complete treatment of the Chi-square test ; 
an introduction to analysis of variance; and the Wherry- 
Doolittle method of test selection. 

As before, I am indebted to Dean J. Е. Walker of the Uni- 
versity of Arizona and to Professor Vernon W. Lemmon of 
Washington University for advice and suggestions of various 
sorts. My colleagues, Dr. W. N. Schoenfeld, Dr. Joseph Zubin, 
and Mr. Ralph F. Hefferline, have read most of the manuscript 
and have offered many constructive criticisms. 


Henry E. GARRETT 
COLUMBIA University 
(1946) 


“ч 


TO THE INSTRUCTOR 


This book contains more material than сап, perhaps, be соу- 
ed thoroughly in à one semester course. The following selec- 
n of topics is suggested, therefore, as meeting the requirements 
a course in “minimum essentials.” 


Chapters I, II, and ITI 

Chapter IV (I and II) 

Chapter V (I and II) 

Chapter VI (II) 

Chapter VIT (T, II, III, and IV) 
Chapter VIII (I and II) 
Chapter IX 

Chapter X (I and II) 

Chapter NI (I) 

Chapter XIII (I and II) 


CONTENTS 


CHAPTER I 
THE FREQUENCY DISTRIBUTION 


SECTION 


vL, 
УИ: 
ПІ. 


ys 


JI. 
Tr. 


III. 


‚ CALCUL! 
. CALCULATION OF THE SD BY 


. Tug. COEFFICIENT 
7, Tug SHORT 
г. WHEN TO Use 


. Тнв CUM 
. PERCENTILES 
1х 


MEASURES IN GENERAL ве 
DnuawixG Ur А FREQUENCY DisrRIBUTION 
THe GRAPHIC REPRESENTATION OF THE FREQUENCY 
DISTRIBUTION . те 
STANDARDS OF Accuracy IN COMPUTATION. 
CHAPTER II 
MEASURES OF CENTRAL TENDENCY 
LATION OF MEASURES OF CENTRAL TENDENCY 
MEAN BY THE “ASSUMED 


CALCU 
CALCULATION OF THE 
Mean” ов SHORT METHOD ....... 
Wuen то USE THE VARIOUS MEASURES ОҒ Criss: 


TENDENCY 


CHAPTER III 

MEASURES OF VARIABILITY 

ATION OF MEASURES OF VARIABILITY . 
THE SHORT METHOD 
oF VARIATION, V 8% 
METHOD APPLIED TO DISCRETE SERIES 
THE VARIOUS MEASURES OF VARIA- 


BILITY 
CHAPTER IV 
STRIBUTIONS, GRAPHIC 


CUMULATIVE DI 
ND PERCENTILES 


METHODS, 4 


vnATIVE FREQU 
AND PERCEN 


ENCY GRAPH... ... . 
TILE RANKS ..... 


74 


x 


CONTENTS’ 


SECTION 


III. 
IV. 


Tue CUMULATIVE PERCENTAGE CURVE ов OGIVE 
H 


OTHER GRAPHICAL METHODS . 


CHAPTER V 
THE NORMAL PROBABILITY CURVE 


. THE MEANING AND IMPORTANCE OF THE NORMAL 


PROBABILITY DISTRIBUTION 


. PROPERTIES OF THE NormaL Рковлвилту Dis- 


TRIBUTION d. ht Ap A de de Roe с Gh X^ 4 
MEASURING DIVERGENCE FROM NORMALITY 


. Way Frequency DISTRIBUTIONS DEVIATE FROM 


THE NORMAL Form 


CHAPTER VI 


PAGE 


83 


93 


102 


113 
119 


127 


APPLICATIONS OF THE NORMAL PROBABILITY 


III. 


CURVE 


. PROBLEMS Імуоһуіха PROPORTIONS OF AREA WITHIN 


DIFFERENT PARTS OF THE Мовмль DISTRIBUTION 


. Tug SCALING or TEsr ITEMS 


THE TRANSFORMATION or MEASURES ву RELATIVE 
Position INTO UNITS OF AMOUNT 


CHAPTER VII 
SAMPLING AND RELIABILITY 


- Tug MEANING or RELIABILITY . b OX EG 
.Тнв RELIABILITY ОЕ MEASURES or CENTRAL 


TENDENCY Sols o A, Ж Жа чес 
Tur RELIABILITY оғ MEASURES OF VARIABILITY . 


‚ Tue RELIABILITY or THE DIFFERENCE BETWEEN 


Two MEASURES 


- THE RELIABILITY ог CERTAIN OTHER MEASURES. 
VI. 


SAMPLING AND THE Us or RELIABILITY FORMULAS 


135 
146 


160 


181 


182 
194 


197 
218 
222 


м N 


CONTENTS 


CHAPTER VIII 
TESTING EXPERIMENTAL HYPOTHESES 


SECTION 


1. 
IT. 
ПІ. 


ІП. 


Tur Хом, HYPOTHESIS 
Tur x? (CHI-SQUARE) TEST 
THE ANALYSIS OF VARIANCE . 


CHAPTER IX 
LINEAR CORRELATION 


. Tur MEANING оғ CORRELATION 
. Tug COEFFICIENT OF CORRELATION . 
. Tug CALCULATION OF THE COEFFICIENT OF Connn- 


LATION BY THE Propuct-Moment METHOD 


. Tur RELIABILITY OF THE COEFFICIENT OF CORRE- 


LATION . 


CHAPTER X 
REGRESSION AND PREDICTION 


. Tur REGRESSION EQUATIONS . 
. Tur RELIABILITY оғ PREDICTIONS 
. Tus Errecr or VARIABILITY OF Mass | UPON 


THE SIZE OF 7. 


Г. Tur SOLUTION OF A SECOND CIGERELATION Рвов- 


LEM 


. Tug ТЫ OF THE - Совғртстюнт ОЕ 


CORRELATION . 


CHAPTER XI 
PURTHER METHODS OF CORRELATION 


COMPUTING CORRELATION FROM RANKS 


. MEASURING CORRELATION FROM Dara ml 


INTO CATEGORIES . ->> > 
CURVILINEAR OR Non-Linear Bmamonsane 


xi 


347 
365 


xii CONTENTS 


CHAPTER XII 
THE RELIABILITY AND VALIDITY OF TEST 


SCORES 
SECTION PAGE 
I. THE RELIABILITY or Test Scores . . . . . . . 380 
II. THE Үлілгіту or Test Scores ........ 394 
ІП. Ітем ÁNALYSIS „...„....-....... 899 


CHAPTER XIII 
PARTIAL AND MULTIPLE CORRELATION 


I. Tap MEANING or PARTIAL AND MULTIPLE CORRE- 


LATION. 2... „ 404 
П. AN IrnLUSTRATIVE humo Prone M INNON v- 

ING THREE VARIABLES. . . . . 4006 
ПІ. GENERAL FORMULAS ror USE IN Баттал AND Мо L- 

TIPLE CORRELATION o w s e s . 5... . .... 4l4 
IV. Spurious CORRELATION . . . . ....... 429 


CHAPTER XIV 
MULTIPLE CORRELATION IN TEST SELECTION 
I. Tas Wnurnnv-DooLrrrLE Test SELECTION METHOD 435 
П. APPLICATIONS оғ PARTIAL AND MULTIPLE CORRE- 


LATION >. so? ok So RO m RO: 8 £ ox oko. ЖБ 
REFERENCE "TABLES. - 5. таза B 
TABLES OF SQUARES AND боена Roors геле £7 


WR зерно kom 9 5 9779 as oc ИМ 


STATISTICS IN PSYCHOLOGY 
AND EDUCATION 


CHAPTER I 
THE FREQUENCY DISTRIBUTION 


I. MEASURES IN GENERAL 


1. What Is Meant by Measurement 
Тнв measurement of individuals and objects may be of various 
kinds, and may be taken to varying degrees of precision. When 
individuals have been ranked or arranged in a series with 
respect to some attribute or trait, we have perhaps the simplest 
sort of measurement. Children may be put in order for height, 
weight, or regularity of school attendance; salesmen may be 
ranked for years of experience, or amount of sales over a year; 
advertisements or pictures may be ranked for amount of color, 
or for cost, or for sales appeal. Rank order tells us, in a rough 
way, how much of an attribute a given person or thing pos- 
sesses. But it tells us little else except serial position in a group. 
We cannot add or subtract ranks as we сап inches or pounds: 
а person's rank is always relative to the ranks of other mem- 
bers of his group, and is never absolute, i.e., in terms of some 
known unit. | 

Measurements of individuals may also be expressed as scores. 
Scores are usually given in terms of time taken to complete a 
task, or amount done in a given time; less often scores are 
expressed in terms of diffieulty of the task performed, or ex- 
cellence of the final result. Scores vary with performance, 
although score-changes probably do not parallel performance- 
changes exactly. When scores are expressed in equal units, 
they constitute a scale. Scaled tests in psychology and educa- 
tion have equal units or steps but do not possess an absolute 
zero point. On the other hand, the “‘c.g.s. scales” (centimeters, 
grams, seconds) of physics do have equal units and an absolute 

1 ` 


^ 


2 STATISTICS IN PSYCHOLOGY AND EDUCATION 


zero point. * Scores" from physieal scales are called measures; 
they may be added or subtracted and a "score" of twenty 
inches, say, 18 twice а "score" of ten inches. Sealed scores 
from mental tests may also be added or subtracted just as we 
add and subtract inches. But we cannot say that a score of 
40 achieved on a test is twice as good as a score of 20, since 
neither is measured from a zero point of just no ability. Traits 
and other characteristics, determinations of which are express- 
ible as scores or measures, are known generally as variables. 


2. Continuous and Discrete Series 


In the measurement of mental and social traits, most of thev 
variables with which we deal fall into continuous series. A 
continuous series is one which is capable of any degree of sub- 
division, although in practice divisions smaller than some con- 
venient unit are rarely employed. Measurements of general 
intelligence illustrate scores which fall into continuous series, 
1.0.78, for example, may be thought of as increasing by incre- 
ments of 1 on an ability continuum which extends from the 
idiot to the genius. But there is no reason why with more 
refined methods of measurement we should not be able to get 
J.Q.’s of 100.8 or even of 100.83. Physieal measures such as 
height, weight, and cephalic index as well as scores from mental 
and educational tests fall into continuous series: within the 
given range any measure, integral or fractional, may exist and 
have meaning. When gaps occur in а truly continuous series 
these are to be attributed to a failure to measure enough cases, 
to the relative crudity of the measuring instrument, or to some 
other factor of a like sort, rather than to the lack of measures 
within the gaps. 
| Not all variables fall into continuous series. A salary scale 
іп а department store may run from $10 per week to $20 per 
week in units of $1; no one receives, let us say, $17.53 per week 
Again, the average family in а certain locality may work | 

Я К out 
mathematically to have 2.57 children, although there is ob- 
viously a real gap between two children and three children. 


«et 


THE FREQUENCY DISTRI BUTION 3 


Series which exhibit real gaps are called: discrete ог discon- “ 
linuous. 

It is perhaps fortunate that nearly all of the variables with 
which we deal in psychology and education fall into continuous 
series or may be profitably treated as continuous. This makes 
it possible for us to concern ourselves for the present with 
methods of handling continuous data, and to postpone the 
discussion of discrete data to a later page (68). 

In the following sections we shall define more precisely just 
what is meant by a score in a continuous series, and then show 
how scores may be classified into what is called a frequency 
distribution. 


3. The Meaning of Scores in Continuous Series 

Scores or other numbers in continuous series are to be 
thought of as distances along a continuum, rather than as dis- 
crete points. An inch is the linear magnitude between two 
divisions on a foot-rule; and, in like manner, a score in a 
mental test is a unit distance between two limits. A score of 
150 upon an intelligence examination, for example, represents 
the interval 149.5 up to 150.5. The exaet midpoint of this 
score-interval is 150 as shown below. 


Score 150 
150 
149.5 150.5 


Other scores are to be interpreted in the same way. A score 
of 8 on the Thorndike Handwriting Scale, for instance, in- 
cludes all values from 7.5 up to 8.5; ie., any value from à 
point .5 unit below 8, to .5 unit above 8. Hence, 7.7, 8.0, and 
8.4 may all be scored 8. An interval extending from .5 unit 
below to .5 unit above the given value is the usual mathe- 
matical meaning of a single score. 

There is another and somewhat different, meaning which a 
test score may have. According to this second view, a score 
of 150 means that an individual has done at least 150 items 
correctly, but not 151. Hence, а score of 150 represents any 


— 


4 STATISTICS IN PSYCHOLOGY AND EDUCATION 


' value between 150 and 151. Any fractional value greater than 
150, but less than 151, e.g., 150.3 or 150.8, since it falls within 
the interval 150-151 is scored simply as 150. Тһе middle of 
the score interval is 150.5. (Зее below.) 


Р Score 150 
L 150.5 
150 151 


Both of these ways of defining а score are valid апа useful. 


Which to use will depend upon the way in which the test is А 


Scored and on the meaning of the units of measurement em- 
ployed. If each of ten boys is recorded as having a height of 
sixty-four inches this will ordinarily mean that these heights 
fall between 63.5 and 64.5 inches (middle value 64 in.), and 
not between sixty-four and sixty-five inches (middle value 64.5 
in). On the other hand, the ages of twenty-five children, all 
recorded as being nine years old, will most probably lie be- 
tween nine and ten years; will be greater than nine and less 
than ten years (middle value 9.5). But “nine years old must 
be taken in many studies to mean 8.5 up to 9.5 years with a 
middle value of nine years. The point to remember is that re- 
sults obtained from treating scores under our second definition 
will always be .5 unit higher than results obtained when Scores 
are taken under the first or mathematical definition, Тһе 
student will often have to decide, perhaps somewhat arbi- 
trarily, which meaning a score should have. As a general rule 
it is safer to take the first meaning of a score unless clearly 
indicated otherwise. This will be the method followed through- 
out this book. That is, scores of 62 and 231, say, will usually 
mean 61.5 up to 62.5, and 230.5 up to 231.5, and not 62 up to 
63, and 231 up to 232. 


П. Drawine Up a FREQUENCY DisrRIBUTION 
1. The Classification of Measures 


Data collected from tests and experiments often have little 
meanmg or significance until they have been rearranged or 


c — À —=— 


THE FREQUENCY DISTRIBUTION 5 


classified in a systematie way. Тһе first task that confronts us, 
then, is the organization of our material and this leads naturally 
to a grouping of the measures or scores into classes or categories. 
Тһе procedure in grouping falls under three main heads: 

(1) Determination of the range or the,interval between the 
largest end smallest scores. The range is found by subtracting 
the smallest from the largest score. 

(2) Decision as to the number and size of the groups to be 
used in classification. The nuraber and size of these class- 


\ intervals will depend upon the range of scores and the kind of 


measures with which we are dealing. 

(3) Tabulation of the separate scores within their proper 
class-intervals. 

These three principles of classification are illustrated in 
Table 1. The figures in this table represent the Army Alpha 
scores earned by fifty college men. Since the highest score is 
197, and the lowest 142, the range (197-142) is exactly 55. 
In deciding upon the number of classes to be used in grouping, 
а good general rule is to select by trial an interval which will 
yield not more than twenty nor less than ten classes.* 

The number of class-intervals which a given range will 
yield can be determined approximately (within one interval) by 
dividing the range by the interval tentatively chosen. In the 
present problem, 55 (the range) divided by 5 (the interval) 
gives 11, which is one less than the actual number of intervals, 
namely, 12. Ап interval of three units will yield nineteen 
classes; an interval of ten units, six classes. 

Тһе tabulation of the separate scores within their class- 
intervals is shown in Table 1. In the first column of this table 
the class-intervals have been listed serially from the smallest 
Score at the bottom of the column to the largest score at the 
top. Each class-interval comprises exactly five scores. The 
first interval “140 up to 145” begins with score 140 and ends 
with 144, thus including the five scores 140, 141, 142, 143, and 

* This rule must often be broken when the number of scores is very 
large or very small. 


, 


6 STATISTICS IN PSYCHOLOGY AND EDUCATION 


TABLE 1 


Tur TABULATION OF ARMY ALPHA Scores MADE BY 
Firry COLLEGE STUDENTS 


1. The original scores ungrouped 
185 166 176 145 166 191 177 164 171 174 
147 178 176 # 142 170 158 i 167 180 178 
173 148 168 187 181 172 165 169 173 184 
175 156 158 187 156 172 162 193 173 183 
* 197 181 151 161 153 172 162 179 188 179 


* Highest score # Lowest score 


2. Тһе same fifty scores grouped into a frequency distribution 


(1) (2) (3) 

Class-Intervals Tallies f(frequency) 
195 up to 200 / 1 
190 ** ** 195 // 2 
195 8 % 190 //// 4 
180 " “185 IALL 5 
175 “ “ 180 FLL Hl 8 
170 “ “ 175 IIL HELL 10 
165 “ “ 170 TH | 6 
100 = “ 165 LETT 4 
155 "~ 160 ТІ 4. 
Тар = ^" 155 // 2 
145 “ “ 150 /// 3 
140 “ “ 145 / 1 
N = 50 


144. Тһе second interval “145 up to 150” begins with 145 and 
ends with 149, i.e., at score 150. Тһе last interval “195 up to 
200” begins with score 195 and ends at score 200, thus including 
the scores 195, 196, 197, 198, 199. In column (2), marked 
“Tallies,” the separate scores have been listed Opposite their 
proper intervals. 'The first score, 185, is represented by a tall 

placed opposite interval “185 up to 190"; the second se е4 
147, by а tally placed opposite interval “145 up to 150”; and 
the third score, 173, by a tally placed opposite “170 up to 175. ы 
The remaining scores have been tabulated in the same w 
When all fifty scores have been listed, the total numb ae 
tallies on each class-interval (1.е., the frequency) is iier >” 
column (3) headed f (frequency). The sum of the f ыма 


THE FREQUENCY DISTRIBUTION 7 


called №. When the total frequency withih each class-interval 
has been tabulated opposite the proper interval, as shown in 
column (3), our fifty Army Alpha scores are arranged in а 
frequency distribution. 

The reader will note that the beginning score of the first 
interval in the distribution (140 up to 145) has been set at 140 
although the lowest score in the series is 142. When the in- 
terval selected for tabulation is five units it facilitates tabulation 
as well as computations which come later if the score limits of 
the first interval, and, accordingly, of each successive interval, 
are multiples of five. А class-interval “142 up to 147” is just 
as good theoretically as а class-interval “140 up to 145"; but 
the second is easier to handle from the standpoint of the 
arithmetic involved. 


2. Methods of Describing the Limits of the Class-Intervals in 
a Frequency Distribution 


Table 2 illustrates three ways of expressing the limits of the 
class-intervals in a frequency distribution. In (A), the interval 
“140 up to 145” means, as we have already seen, that all scores 
from 140 up to but not including 145 fall within this grouping. 
The intervals in (B) cover the same distances as in (A), but the 
upper and lower limits of each interval are defined more exactly. 
We have seen (p. 3) that a score of 140 in a continuous series 
ordinarily means the interval 139.5 up to 140.5; and that a 
score of 144 means 143.5 up to 144.5. Accordingly, to express 
precisely the fact that an interval begins with 140 and ends with 
144, we may write 139.5 (the beginning of score 140) as the 
lower limit, and 144.5 (end of score 144 or beginning of score 
145) as the upper limit of this step. The class-intervals in (C) 
express the same facts more clearly than in (A) and less exactly 
than in (B). Thus, “140-144” means that this interval be- 
gins with score 140 and ends with score 144; but the precise 
limits of the interval are not given. The diagram below will 
show how (A), (B), and (C) are three ways of expressing iden- 
tically the same facts: 


8 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Class-Interval 


140 up to 145 » 
139.5 up to 144.5 


140-144 
1 _ Interval 
pm 2 3 4 5 "Ends 
з = Saal x 
139.5 140 141 142 143 144 144.6 


TABLE 2 


METHODS or GROUPING Scores INTO А FREQUENCY 
NES DISTRIBUTION 


(The data are the fifty Army Alpha scores tabulated in Table 1, p. 6) 


en = бане ы 

- id- Class- Mid- ass- Mid- 
Mm Paus f Intervals point f Intervals point f 
с 1 194.5 up tọ 199.5 197 1 195-199 197 | 
195 up to 200 192 5 189.5 "Р 1945 192 2 105-109 192 2 
185 " " 190 187 4 1815 " 4 1843 187 4 185-189 187 4 
185 « 185 182 5 1260. u 184.5 182 5 180-184 182 5 
175 j 1180 M7 8 1745 ^ «1795 177 8 175-179 192 5 
100 «ч 442. 10. 1088 е 1745 172 10 170174 ІН 10 
15 x nad 107 6 1005. 41005 167 Ө 38-1 ij 10 
160 (165 102 4 1505 ^ ^ 1645 162 а 160-104 167 4 
155 7 100 157 4 1545 ^ "#1595 157 4 155-189 102 4 
100 ; ; 186 182 2 149.5 ^ н 1545 152 2 100100 107 2 
145 u u 190 147 3 1445 5 ^1495 147 3 145-149 147 3 
Мо «ер M3 1 135" “gas l2 1 Moin ИТ 1 
М = 50 М = 50 М = 50 


For the rapid tabulation of scores within their proper inter- 
vals, method (C) is to be preferred to (B) or (A). In (A) it is 
fairly easy, even when one is on guard, to let a score of 160, 
say, slip into the interval *155 up to 160," owing simply to the 
presence of 160 at the upper limit of the interval. Method (B) 
is clumsy and time-consuming because of the need for writin 
-5 at the beginning and end of every interval. Method (C), 
while easiest for tabulation, offers the difficulty that in later cal- 
culations one must constantly remember that the expressed class 
limits are not the actual class limits: that interval “140-144” 
begins at 139.5 (not 140) and ends at 144.5 (not 144). If this 
is clearly understood, method (C) is as accurate as (B) op (A). 
It will be generally used throughout this book, 

_The Scores grouped within a given interval in a frequency 
distribution are. assumed to be spread evenly over the entire 


—ÀÁ— 


THE FREQUENCY DISTRIBUTION 9 


interval. This assumption is made whether the interval is 
three, five, or ten units. If we wish to represent all of the 
Scores within a given interval by some single value, the mid- 
point of the interval is taken to be the logical choice. For 
example, in the interval 175-179 [Table 2, method (C)] all of 
the eight scores upon this interval are represented by the single 
value 177, the midpoint of the interval.* Why 177 is the mid- 
point of this interval is shown graphically below: 


Midpoint 


Interval Interval 
Begins 1 2 ии 4 5 Ends 
174.5 175 176 171 178 179 179.8 


A simple rule for finding the midpoint of an interval is 
Midpoint = lower limit of interval + (upper imi lower limit), 


179.5 — 174.5) 
2 


In our illustration, 174.5 + ( = 177. Since the 


interval is five units, it follows that the midpoint must be 2.5 
units from the lower limit of the class, i.e., 174.5 + 2.5; or 2.5 
units from the upper limit of the class, i.e., 179.5 — 2.5. 

It is often a question whether the midpoint is, in fact, fairly 
representative of all of the scores upon a given interval. Re- 
ferring to Table 1, we find that of the ten scores in the class- 
interval “170 up to 175" (midpoint 172), three (170, 171, 171) 
аге below the midpoint; three (172, 172, 172) are on the mid- 
point; and four (173, 173, 173, 174) are above the midpoint. 
Of the five scores upon interval *180 up to 185," three (180, 
181, 181) are below the midpoint (182); and two (183, 184) are 
above. Тһе single score of 197 upon interval “195 up to 200” 
falls exactly on the midpoint. In these examples the midpoint 
represents quite adequately the scores within the given intervals; 
but it must be admitted that the balancing of scores above and 
below the midpoint is not always so satisfactory as it is here. 
When the data are scanty, or when the distribution is badly 


* The same value (namely, 177) is, of course, the midpoint of the in- 
terval when methods (A) and (В) are used. 


10 STATISTICS IN PSYCHOLOGY AND EDUCATION 


her г be many more scores on one side 
p cp Gan an oe other. "When this happens, the 
d Атты not fairly represent all of the scores within the 
mi ы Ы 
= n that the midpoint is the most representative 
S one interval holds best when the number of scores 
"s е бон is large, and when the intervals are not too 
in the MEN when neither of these conditions fully ob- 
е = idpoint assumption is not greatly in error and is 
шш bi ne m can make. In the long run, about as many 
io ег above as below the various midpoint values; 
сойлей аав in one interval will usually be offset by the 
T lition in another interval. 
ke central tendency (p. 32) and of variability 
(р. 49) calculated from data grouped into intervals of five 
its, say, will usually vary slightly from the same measures 
bien atea from these data when ungrouped, or when grouped 
йе йрй of, say, three or ten units. These variations ürise 
from (1) differences in the size of the groups in which the data 
are classified, and (2) the fact that each Score Within an interval 
is assigned the value of the middle of the interval instead of 
its actual value. Corrections are Sometimes applied to the 
measures of variability to correct the grouping error thus intro- 
duced. But usually the error which results from grouping is so 
small that it may be neglected in ordinary statistical work, 


ПТ. Tae Сварнтс REPRESENTATION or THE Frequency | 
DISTRIBUTION 


ecause these 
en the most 
notice. For 


of visual presentation; and, at the 
ten abstract. 


THE FREQUENCY DISTRIBUTION 11 


and difficult of interpretation — into more concrete and under- 
standable form. 

Four methods of representing a frequency distribution graph- 
ically are in general use. These methods yield the frequency 
polygon, the histogram, the cumulative frequency graph, and the 
cumulative percentage curve or ogive. The first two graphie 
devices will be treated in the following sections; the second 
two in Chapter V. 


í. Graphical Representation of Data; General Principles 

Before considering methods of constructing a frequency poly- 
gon or histogram, we shall review briefly the simple algebraic 
principles which apply to all graphical representation of data. 
Graphing or plotting is done with reference to two lines or 
coérdinale axes, the one the vertical or Y-axis, the other the 
horizontal or X-axis. These basic lines are perpendicular to 
each other, the point where they intersect being called O, or 
the origin. Figure 1 represents a system of coórdinate axes. 

The origin is the zero point or point of reference for both 
axes. Distances measured along the X-axis to the right of O 
are called positive, distances measured along the X-axis to the 
left of O negative. In the same way, distances measured on 
the Y-axis above О are positive; distances below О negative. By 
their intersection at O, the X- and Y-axes form four divisions 
or quadrants. In the upper right division or first quadrant (see 
Fig. 1), both x and y measures are positive (+ +). In the upper 
left division or second quadrant, x is minus and y plus (— +). 
In the lower left or third quadrant, both x and y are negative 
(= —); while in the lower right or fourth quadrant, x is plus 
and y minus (+ —). 

То locate or plot а point “А” whose coórdinates are x = 4, 
and y = 3, we go out from О four units on the X-axis, and up 
from the origin three units on the Y-axis. Where the perpen- ` 
diculars to these points intersect, we locate the point “ А” (see 
Fig. 1). The point “В,” whose coórdinates are z = — 5, and 
y = — 7, is plotted in the third quadrant by going left from О 


12 STATISTICS IN PSYCHOLOGY AND EDUCATION , 
% 


Fic. 1. А System of Coórdinate Axes, 


along the Х-алїз five units, and then down seven units, as 
shown in the figure.’ In like manner, any points «С» and “D” 
Whose x and у values are known сап be located with reference 
to OY and OX, the coórdinate axes. The distance of а point 
from О on the X-axis is commonly called the abscissa; and the 
distance of the point from О on the Y-axis the ordinate. The 
abscissa of point “D” ig + 9, and the ordinate, — 2. 


еу 


THE FREQUENCY DISTRIBUTION 13 


the origin; and the frequencies within елен interval are meas- 
ured off upon the Y-azis. There is one score on the first in- 
terval, 140 up to 145 (Table 1, p. 6). То represent this score 
on the diagram, we go out on the X-axis to 142, midway be- 
tween 139.5 and 144.5, and count up one l-unit. Тһе fre- 
quency on the next interval, 145 up to 150, is three, hence the 
second point falls midway between 144.5 and 149.5, three units 
above the X-axis. The two scores on interval 150 up to 155, 
the four scores on 155 up to 160, and the frequency on each 
succeeding interval, are represented in every case by a point 
the specified number of scores (Y-units) above the X-axis, and 
midway between the upper and lower limits of the interval 
upon which the f lies. It is important in plotting a frequency 
polygon to remember that the midpoint of an interval is al- 
ways taken to represent the entire interval. The height of the 
ordinate at the midpoint represents all of the scores within 
the given interval. 

When all of the points have been located, they are joined in 
regular order to give the f| requency polygon * shown in Figure 2. 
In order to complete the figure, one interval (134.5 to 139.5) at 
the low end, and one interval (199.5 to 204.5) at the high end 
of the distribution have been included on the X-seale. Тһе 
frequency on each of these intervals is zero at the midpoint; 
hence by including them we begin the frequency polygon one- 
half interval below the first, and end it one-half interval above 
the last, class-interval on the X-axis. 

In order to give symmetry and balance to a polygon, one 
must exercise care in the selection of unit-distances to represent 
the intervals on the X-axis and the frequencies on the Y-axis. 
А too-long X-unit tends to stretch out the polygon, while а 
too-short X-unit crowds the separate points. Оп the other 
hand, a too-long Y-unit exaggerates the changes from interval 
to interval, and a too-short Y-unit makes the polygon too flat. 
А good general rule is to select X- and Y-units which will make 
the height of the figure approximately 75% of its width. The 

ж Polygon means “many-sided figure." 


14 STATISTICS IN PSYCHOLOGY AND EDUCATION 


10 
9 
8 
1 
2 © 
5 
5 5 
© 4 
= 3 
2 
1 Mean = 170.8 
D45! M45 Г 1545 Г 1645 I7 
139.5 149.5 1595 1695 
Scores 


Median =172 } 
ай ; | 


45 1845 1945 2 
1195 189.5 1995 


04.5 | 


9 . Frequency Polygon Plotted from the Distribution of Fifty Army 
Sua ; Alpha Scores Given in Table 1, page 6. 


Scores Маре ву 200 Aputts орох a С 


TABLE 3 


Class-Interval — 4 


Class-Intervals 


131.5 
127.5 
123.5 
119.5 
115.5 
1115 
107.5 
103.5 


ratio of height to width may 
still have good proportion: 
and leave the figure well 


Scores 
135.5 up to 


“ 
“ 
“ 
“ 


139.5 
135.5 
131.5 
127.5 
128.5 
119.5 
115.5 
111.5 
107.5 


Midpoint 
x 


137.5 
133.5 
129.5 
125.5 
121.5 
117.5 
113.5 
109.5 
105.5 


ANCELLATION TEST 


Ј 


3 ” 
3 | 
16 

28 

52 

49 

27 

18 

_{ 

N = 200 


vary from 60-80% and the figure 


s; but it can rarely go below 50% 
balanced. Тһе frequency polygon in 


THE FREQUENCY DISTRIBUTION 15 


ид 


Figure 2 illustrates the “75% rule." There are thirteen class- 
intervals laid off on the X-axis — twelve full intervals plus 
one-half interval at the beginning and at the end of the range. 
Hence, our polygon should be 75% of thirteen, or about ten 
X-axis units high. These ten units (each equal to one interval) 
are laid off on the Y-axis. То determine how many scores (f's) 
should be assigned to each unit on the Y-axis, we divide 10, the 
largest f (on interval 169.5 up to 174.5) by 10, the number of 
intervals laid off on Y. Тһе result (1.е., 1) shows that each 
Y-unit is exactly equal to one f or score, as shown in Figure 2, 

The polygon in Figure 5, page 20, furnishes another illus- 
tration of this method of plotting a frequency polygon so as to 
preserve balance. This polygon represents the distribution of 
200 cancellation scores shown in Table 3. There ате ten in- 
tervals laid off along the base line or X-axis — nine full in- 
tervals plus one-half interval at the beginning and at the end of 
the range. Since 75% of 10 is 7.5, the height of our figure could 
be either seven or eight X-axis units. To determine the “best” 
value for each Y-unit, we divide 52, the largest Л (on 119.5 up 
to 123.5) by 7, getting 77; and then by 8, getting 6.5. Using 
whole numbers for convenience, evidently we may lay off on 
the Y-axis seven units, each representing eight scores; or 
eight units each representing seven scores. Тһе first combi- 
nation was chosen because a unit of eight f's is somewhat 
easier to handle than one of seven. А slightly longer Y-unit 
representing ten f's would perhaps have been still more con- 
venient. 

Тһе total frequency (№) of a distribution is represented by 
the area of its polygon; that is, the area bounded by the fre- 
quency surface and the X-axis. The area lying above any 
given interval, however, cannot be taken as proportional to 
the number of cases within the interval because of the irregu- 
larities in the distribution and consequently in the frequency 
surface. To show the positions of the mean and the median 
in the graph, we may locate these measures on the X-axis as 
shown in Figures 2 and 5. Perpendiculars erected at these 


16 STATISTICS IN PSYCHOLOGY AND EDUCATION 
points show the approximate frequency at the mean and at 
the median. | 

Steps involved in constructing a frequency polygon may be 
summarized as follows: 


(1) Draw two straight lines perpendicular to each other, the vertical 
line near the left side of the paper, the horizontal line near the 
bottom. Label the vertical line (the Y-axis) CY, and the hori- 
zontal line (the X-axis) OX. Put the О where the two lines inter- 
sect. This point is the origin. 

(2) Lay off the intervals of the frequency distribution at regular dis- 
tances along the X-axis. Begin with the lower limit of the interval 
телі below the lowest in the distribution, and end with the upper 
limit of the interval леті above the highest in the distribution. 
Label the suecessive X distances with the intervallimits. Select 
an X-unit which will allow all of the intervals to be represented 
easily on the graph paper. 

(3) Mark off on Ше Y-axis successive units to represent the scores 
(the frequencies) on the different intervals. Choose a Y-scale 
which will make the largest frequency (the height) of the polygon 
approximately 75% of the width of the figure. 

(4) At the midpoint of each interval on the X-azis go up in the Y 
direction a distance equal to the number of scores on the interval. 
Place points at these locations. 

(5) Join the points plotted in (4) with straight lines to give the fre- 
quency surface. è * 

(2) Smoothing the Frequency Polygon 

Because the sample is small (N — 50) and the frequency dis- 
tribution somewhat irregular, the polygon in Figure 2 tends to 
be jagged in outline. То iron out chance irregularities, and also 
get a better notion of how the figure might look if the data were 
more numerous, the frequency polygon may be “smoothed” 

a shown in Figure 3, page 17. In Smoothing, a series of 

moving” or "running" averages are taken from Which new 
or adjusted frequencies are determined. Тһе method is illus- 
trated in Figure 3. To find an adjusted or “smoothed ” f, we 
add together the f on the given interval and the Рв on the ‘two 


т 


THE FKEQUENCY DISTRIBUTION 17 


adjacent intervals (the one just below and the one just above) 
and divide the sum by 3. For example, the smoothed f for 
Е B так: 008-10 EL Қ 

interval 174.5 up to 179.5 is a Dh 7.67; for interval 


154.5 up to 159.5, 121223 ог 3.33. Тһе smoothed f's for 
the other intervals may be found in ће table below Figure 3. 
То find the smoothed / for the two intervals at the extremes 
of the original distribution, namely, 139.5 up to 144.5, and 


5 


Frequencies 
= кю ы + \л с м (бо о 


4-5 174, 
169.5 


BAS MÁS ^ 1545 " 16 5 С 184,5 T 191.5 1204.5 
139.5 1495 1595 19.5 1895 199.5 
Scores 


Fic. 3. Original and Smoothed Frequency Polygon. (Data from 
Table 1, p. 6.) Тһе original and smoothed /в are given below. 


Scores $ Smoothed f 
200-204 0 -33 
195-199 1 1.00 
190-194 2 2.33 
185-189 4 3.67 
180-184 5 5.67 
175-179 8 7.67 
170-174 10 8.00 
165-169 6 6.67 
160-164 4 4.67 
155-159 4 3.33 
150-154 2 3.00 
145-149 3 2.00 
140-144 1 1.33 
135-139 20 ЖЕСІ 
50 50.00 


18 STATISTICS IN PSYCHOLOGY AND EDUCATION 


194.5 up to 199.5, a slightly different procedure is necessary. 
Here we add 0, the f on the step below or above, the f on the 
given step, and the f on the adjacent step and divide by 3. 
This procedure makes the smoothed / for 139.5 up to 144.5, 
ж 
erige or 1.33, and the smoothed f for 194.5 up to 199.5, 


e 


++ or 1.00. Тһе smoothed f for the intervals 134.5 up 


to 139.5 and 199.5 up to 204.5, for which the frequency in the 
"— ИР 1+0+0 

original distribution is 0, is in each case ~~ 9.33. Note 

that if we omit these two intervals the N for the smoothed 

distribution will be less than 50, since the smoothed distribu- 

tion has frequencies outside the range of the original distribu- 

tion. 

If the already smoothed / in Figure З are subjected to a 
second smoothing, the outline of the frequency surface will be- 
come more nearly a continuous flowing curve. It is doubtful, 
however, whether so much adjustment of the original f's is 
often warranted. When an investigator presents only the 
smoothed frequency polygon and does not give his original 
data, it is impossible for a reader to tell with what he started. 
Moreover, smoothing gives a picture of what an investigator 
might have gotten (not what he did get) if his data had been 
more numerous, or less subject to error than they were. If N 
is large, smoothing may not greatly change the shape of a 
graph, and hence is often unnecessary. The frequency polygon 
in Figure 5, page 20, for example, which represents the distri- 
bution of 200 cancellation test scores, is quite regular without 
any adjustment of the ordinate (i.e., the Y) values. Probably 
the best course for the beginner to follow is to smooth data 
as little as possible. When smoothing seems to be indicated 
in order better to bring out the facts, one should be careful 
always to present original data along with " adjusted" results. 


THE FREQUENCY DISTRIBUTION 19 


3. The Histogram or Column Diagram 

А second way of representing a frequency distribution graph- 
ieally is by means of a histogram or column diagram. This type 
of graph is illustrated in Figure 4, page 20, for the same dis- 
tribution of scores represented by the frequency polygon in 
Figure 3, page 17. Тһе two figures are constructed in much 
the same way, with this important difference: In a frequeney 
polygon all of the scores within a given interval are represented 
by the midpoint of that interval, while in a histogram the 
assumption is made that scores are spread uniformly over their 
intervals. "The measures within each interval of а histogram, 
therefore, are represented by a rectangle, the base of which 
equals the interval, and the height of which equals the number 
of scores (the f) within the interval. Thus the one score upon 
interval 139.5 up to 144.5 is represented by a rectangle whose 
base equals the length of the interval, and whose height equals 
one unit measured off on the Y-axis. The three scores within 
the next interval, 144.5 up to 149.5, are represented by a rec- 
tangle one interval long and three Y-units high. The altitudes 
of the other rectangles vary with the number of Рв upon the 
intervals, the bases all being one intervallong. When the same 
number of scores falls within two or more adjacent intervals, 
as in the intervals 154.5 up to 159.5, and 159.5 up to 164.5, the 
top of the rectangle covers two or more intervals on the X-axis. 
Тһе highest rectangle is, of course, that one (on interval 169.5 
up to 174.5) which has-10, the largest frequency, as its altitude. 
In selecting scales for the X- and У-ахез, the same considera- 
tions, as to height and width of figure, outlined on page 13 for 
the frequency polygon, should be observed. 

Although in a histogram each interval is represented by a 
separate rectangle, it is not necessary to project the sides of 
the rectangles to the base line as is done in Figure 4, page 20. 
The rise or fall of the boundary line shows the increase or de- 
crease in the number of scores from interval to interval and is 
usually the important fact to be brought out (see Fig. 5). As 
in a frequency polygon, the total frequency (N) is represented 


20 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Frequencies 


Mean Median 
Scores 


Fic. 4. Histogram of the Fifty Army Alpha Scores 
Shown in Table 1, page 6. 


eo A 
to о 


g 


Frequencies 


103.5 107.5 111.5 115.5 119.5 123.5 127.5 131.5 135.5 1855 
Scores \ 


Fic. 5. Frequency Pol 15 ^ 
уроп and Histogram of 200 C; i 
Scores Shown in Table 3, page 14. сеи 


e 


THE FREQUENCY DISTRIBUTION 21 


by the area of the histogram. In contrast to the frequency 
polygon, however, the area of each rectangle in а histogram is 
directly proportional to the number of measures within the 
interval. For this reason, the histogram presents an accurate 
picture of the relative proportions of the total frequency from 
interval to interval. 

In order to provide a more detailed comparison of the two 
types of frequency graph, the distribution in Table 3, page 14, 
is plotted upon the same coórdinate axes in Figure 5, page 20, 
as a frequency polygon and as a histogram. The increased num- 
ber of cases and the more symmetrical arrangement of scores in 
the distribution make these figures more regular in appearance 
than those in Figures 2 and 4, pages 14 and 20. 


4. Plotting Two Frequency Distributions on the Same Axes, 
When Samples Differ in Size 
Table 4 gives the distributions of scores on an achievement 
examination made by two groups, А and B, which differ con- 
siderably in size. Group A has 60 cases, Group B, 160 cases. 


TABLE 4 
а) (2) (3) (4) (5) 
Achievement Group A Group B Group A Group B 
Examination Я 7 Регсепі- Регсепі- 
Scores Frequencies ^ Frequencies 

— 0 9 0.0 5.6 

70-79 3 12 5.0 7.5 

60-69 10 32 16.7 20.0 

50-59 16 48 26.7 30.0 

40-49 12 27 20.0 17.0 

30-39 9 20 15.0 12.5 

20-29 6 12 10.0 7.5 

10-19 4 0 6.7 0.0 

60 160 100.1 100.1 


If the two distributions in Table 4 are pitted as polygons or as 
histograms on the same coördinate axes, the fact that the fs 
of Group B are so much larger than those of Group А makes it 
hard to compare directly the range and quality of achievement 


22 STATISTICS IN PSYCHOLOGY AND EDUCATION 


in the two groups.’ A useful device in cases where the 1% 
differ in size is to express both distributions in percentage fre- 
quencies as shown in Table 4. Both N’s are now 100, and the 
f's are comparable from interval to interval. For example, we 
know at once that 26.7% of Group A and 30% of Group B 
made scores of 50 through 59, and that 59% of the A's and 7.5% 
of the B's scored from 70 to 79. Frequency polygons repre- 


£ 
о 
5 20 
8 
È в 
o 2 
e 
3 
5 
8 10 
ж 


0 95 95 295 395 495 505 695 195 695 qj 


Scores 


Fic. 6. Frequency Polygons of the Two Distributions in Table 4 
Scores are laid off on the X-axis, percentage frequencies | 
on the Y-azis. | 


„senting the two distributions, in which percentage frequencies 
instead of original f'$ have been plotted on the same 
shown in Figure 6. These polygons provide an і 
comparison of the relative achievement of our two op 
given by polygons plotted from original аво А VAPEUR 

Percentage frequendjes are readily found by dividin: 
by N and multiplyin by 100. Thus 3/60 x 100 = 50. A 
simple method ^] finding percentage frequencies when a l 
culating machine is available is to divide 100 by N and uin à 
this figure in the ini V multiply each f in turn by it For 


à 


ахез, are 
mmediate 


g each f 


. ТНЕ FREQUENCY DISTRIBUTION og 


exainple: 1.667 (i.e., 100/60) x 3 = 5.0; 1.667 X 10 = 16.7, ete.; 
96% (і.е.; 100/160) x 9 = 5.6, .625 X 12 = 7.5, ete. What per- 
centage frequencies do, in effect, is to scale each distribution 
down to thè вате total N of 100, thus permitting a comparison 
of f’s for each interval. 


5. When to Use the Frequency Polygon and When to Use the 
Histogram 

"The question of when to use the frequency polygon and when 
to use the histogram cannot be answered by a general rule which 
will cover all cases. The frequency polygon is less exact than 
the histogram in that it does not represent accurately, i.e., in 
terms of area, the number of measures within successive in- 
tervals. In comparing two or more graphs plotted on the same 
axes, however, the f. requency polygon is the more useful, since 
the vertical and horizontal lines in the two histograms will 
often coincide. Both the histogram and the frequency polygon 
tell the same story and both are useful in enabling us to show 
in graphic form whether the scores of a group are distributed 
symmetrically or whether they are piled up at the low or at the 
high end of the scale. Not only information with regard to the 
group, but information with regard to the test, may be se- 
cured from а graph. If a test is too easy, the scores will crowd 
the high end of the scale; if the test is too hard, the scores will 
pile up at the low end of the scale. If the test is well suited to 
the group, scores will tend to be distributed symmetrically 
around the mean, a few individuals scoring high, a few low, and 
the majority scoring somewhere near the middle of the scale. 
When this happens, the frequency graph approximates the 
"ideal" or normal frequency curve described in Chapter V, 


IV. STANDARDS or Accuracy IN COMPUTATION * 


"How many places" to carry numerical results is a, question 
which arises persistently in statistical computation. Sometimes 


* This section should be reviewed frequently, and referred to in solving 
the problems given in succeeding chapters. 


24 STATISTICS IN PSYCHOLOGY AND EDUCATION. 


a student, by discarding decimals, throws away legitimate data. 
More often, however, he tends to retain too many decimals, ғ 
practice which may give a false appearance of great precision 
not always justified by the original material. * 

In this section are given some of the generally accepted prin- 
ciples which apply to statistical calculation. Observance of 
these rules will lead to greater uniformity in calculation. They 
should be followed carefully in solving the problems given in 
this book. 


1. Rounded Numbers 

In calculation, numbers are usually “rounded” off to the 
standard of accuracy demanded by the problem. If we roun d 
off 8.6354 to two decimals it becomes 8.64; to one decimal, 
8.6; to the nearest integer, 9. Measures of central tendency 
and variability, coefficients of correlation, and other measures, 
are rarely reported to more than two decimal places. A mean of 
52.6872, for example, is usually reported as 52.69; a standard 
deviation of 12.3841 as 12.38; and a coefficient of correlation 
of .6350 as .63, etc. It is very doubtful whether much of the 
work in mental measurement warrants accuracy beyond the 
second decimal. Convenient rules for rounding numbers to 
two decimals are as follows: When the third decimal is less than 
5, drop it; when greater than 5, increase the preceding figure 
by 1; when exactly 5, compute the fourth decimal and correct 
back to the second place; when exactly 5 followed by zeros 
drop it and make no correction. : 


2. Significant Figures 

Тһе measurement 64.3 inches is assumed to be correct to the 
nearest tenth of an inch, its true value lying Somewhere be- 
tween 64.25 and 64.35 inches. Two places to the left of the 
decimal point, and one to the right are fixed, and hence 64.3 
is said to contain three significant figures. The numbers 643 
and .643 also contain three significant figures each. 

In the number .003046 there are four significant figures, 


THE FREQUENCY DISTRIBUTION 25 


3, 0, 4, and 6, the first two zeros serving merely to locate the 

decimal point. When used to locate a decimal point only, a 

zero is not considered to be a significant figure; .004, for ex- 

ample, has only one significant figure, the two zeros simply 

fixing the position of 4, the significant digit. Тһе following 

illustrations should make clear the matter of significant figures: 
136 has three significant figures. 

136,000 has three significant figures also. The true value of this num- 
ber lies between 136,500 and 135,500. Only the first three 
digits are definitely fixed, the zeros serving simply to locate 
the decimal point or fix the size of the number. 

1360. has four significant figures; the decimal indicates that the 
zero in the fourth place is known — and hence significant. 
136 has three significant figures. 
-1360 has four significant figures; the zero fixes the fourth place. 
:00136 has three significant figures; the first two zeros merely locate 
the decimal point. 

2.00136 has six significant figures; the integer, 2, makes the two 

zeros to the right of the decimal point significant. 


3. Exact and Approximate Numbers 

It is necessary in calculation to make a distinction between 
exact and approximate numbers. Ап exact number is one which 
is found by counting: ten children, 150 test scores, twenty desks 
are examples. Approximate numbers result from the measure- 
ment of variable quantities. Test scores and other measures, 
for example, are approximate since they are represented by in- 
tervals and not exact points on some scale. Thus a score of 61 
may be any value from 60.5 up to 61.5 and a measured height 
of 47.5 inches may be any value from 47.45 up to 47.55 inches 
(see p. 3). Calculations with exact numbers may, in general, 
be carried to as many decimals as we please, since we may as- 
sume as many significant figures as we wish. For example, 
110 test scores, which means that exactly 110 subjects were 
tested, could be written № = 110.000 . . . to n significant 
figures. Calculations based upon approximate numbers de- 
pend upon, and are limited by, the number of significant figures 


26 STATISTICS IN PSYCHOLOGY AND EDUCATION 


in the numbers which enter into the calculations. This will 
be clearer in the following “rules”: 


4. Rules for Computation 
(1) Accuracy of a Product 

(a) The number of significant figures in the product of two 
or more approximate numbers will equal the number of sig- 
nificant figures in that one of the numbers which is the least 
accurate, i.e., which contains the smallest number of signifi- 
cant figures. То illustrate: 

125.5 X 7.0 — 880, not 878.5, because 7.0, the less accurate of the 

two numbers, contains only two significant figures. 
The number 125.5 contains four significant figures. 

125.5 X 7.000 = 878.5. Both numbers now contain four significant 
figures; hence their product also contains four sig- 
nificant figures. 

(b) When multiplying an exact number by an approximate 
number, the number of significant figures in the product ig 
determined by the number of significant figures in the approxi- 
mate number. To illustrate: 

If each of twelve children (twelve is an exact number) has an 
М.А. of eight, years (eight is an approximate number) the product 
12 X 8 must be written either as 90 or 100, since the üpproximate 
number has only one significant digit. If, however, each M.A. of 
eight years can be written as 8.0, the product 12 x 8.0 can be 
written as 96, since 8.0 contains two significant digits, 

(2) Accuracy of а Quotient 

(a) When dividing one approximate number by another ap- 
proximate number, the significant figures in the quotient will 
equal the significant figures in that one of the two numbers 
(dividend or divisor) which is less accurate, i.e., which has the 
smaller number of significant digits. Illustrations: 

9.27 should be written .23, not “22609, since 41 (the less accurate 
4l number) contains only two significant figures, 
16 should be written .0034, not 0033869, since 16 (the less accurate 
4724 number) has two significant figures. 


^ 


THE FREQUENCY DISTRIBUTION 27 


(b) In dividing an approximate number by an exact number, 
the number of significant figures in the quotient will equal the 
number of significant figures in the approximate number. Illus- 
trations: 

9.27 should be written -226, since 9.27, the approximate number, has 

“41 three significant figures. The number 41 is an exact number. 

8541 should be written 170.8, not 170.82, since S541, the approxi- 
90 mate number, contains only four significant figures. 


(c) In dealing with exact numbers, quotients may be written 
to as many decimals as one wishes. 
(3) Accuracy of a Root or Power 

(а) Тһе square root of an approximate number can contain 
no more significant figures than there are in the number itself. 
The number of significant figures retained in a square root is 
usually less than (often one-half) the number of significant 
figures in the number. For example, v/159.5600 is usually 
written 12.63, and not 12.63176, although the original number, 
159.5600, contains seven Significant figures. 

(b) Тһе square, or higher power, of an approximate number 
contains as many significant figures as there are in the original 
number (and no more) For example, (.034)° = .0012 (two 
significant figures) and not .001156 (four significant figures). 

(c) Roots and powers of exact numbers may be taken to as 
many decimal places as one wishes. 


(4) Accuracy of a Sum or Difference 

The number of decimal places to be retained in a sum or 
difference should be no greater than the number of decimals in 
the least accurate of the numbers added or subtracted. Illus- 
trations: 

362.2 + 18.225 + 5.3062 = 385.7 not 385.7312, since the least accur- 
ate number (362.2) contains only one 
decimal. 

362.2 — 18.245 = 344.0, not 343.955, since the less accurate 
number (362.2) contains only one decimal. 


28 STATISTICS IN PSYCHOLOGY AND EDUCATION 


PROBLEMS 


1. Indicate which of the following variables fall into continuous and 
which into diserete series: (a) time; (b) salaries in a large business 
firm; (c) sizes of elementary school classes; (d) age; (с) census 
data; (f) distance traveled by car; (g) football scores; (h) weight; 
(2) numbers of pages in 100 books; (j) mental ages. 


2. Write the exact upper and lower limits of the following scores in 
accordance with the two definitions of a score in continuous series, 
given on pages 3 and 4: 

62 175 1 
8 312 87 


3. Suppose that sets of scores have the ranges given below. Indicate 
how large an interval, and how many intervals, you would suggest 
for use in drawing up a frequency distribution of each set. 


Range Size of Interval Number of Intervals 
16 to 87 
0 to 46 
110 to 212 
63 to 151 
4 to 12 


4. In each of the following write (a) the exact lower and upper limits 
of the class-intervals (following the first definition of a score, given 
on page 3), and (b) the midpoint of each interval. 

45-47 162.5-167.5 63-67 0-9 
1-4 80 up to 90 16-17 25-98 
5. (a) Tabulate the following twenty-five scores into two frequency 


distributions, using (1) an interval of three, and (2) an interval 
of five units. Let the first interval begin with the score of 60. 


72 75 77 67 72 
81 78 65 86- 73 
67 82 76 76 70 
83 71 63 72 72 
614 67 84 69 64 


(b) The following 100 scores were made on the Thorndike Intelli- 
gence Examination for High School Graduates by applicants 


THE FREQUENCY DISTRIBUTION 29 


for admission to college. Tabulate these scores into three fre- 
quency distributions, using class-intervals of three, five, and 
ten units. Let the first interval begin with the score 45. 


63 78 76 58 95 
78 86 80 96 94 
E 78 92 86 88 
82 101 102 70‹ 50 
74 65 73 72 91 
103 90 87 74 83 
78 75 70 84 98 
86 73 85 99 93 
103 90 79 81 83 
87 86 93 89 76 
73 86 82 71 94 
95 84 90 73 75 
82 86 53 63 56 
89 76 81 105 78 
73 75 85 74 95 
92 83 72 98 110 
85 103 81 78 98 
80 86 96 78 71 
81 84 81 83 92 
90 85 85 96 72 


- (а) Plot frequency polygons for the two distributions of twenty- 
five seores found in 5(a), using intervals of three and of five 
score units. Smooth both distributions (see р. 16) and plot 
the smoothed Рѕ and the original scores on the same axes, 

(b) Plot a frequency polygon of the 100 Scores in 5(b) using an 
interval of ten score units. Superimpose a histogram upon the 
frequency polygon. 

(c) On the same axes, plot a frequency polygon and histogram of 
the 100 Thorndike scores using an interval of five Score units, 
Smooth the frequency polygon and plot on the Same diagram, 


- Reduce the distributions А and B below to percentage frequencies 
and plot them as frequency polygons on the Same axes, Ts your 
understanding of the achievement of these Sroups advanced by 
this treatment of the data? 


30 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Scores * Group А Group B 
52-55 1. 8 
48-51 0 5 
44-47 5 12 
40-43 10 58 
36-39 20 40 
32-35 12 22 
28-31 8 10 
24-27 2 15 
20-23 3 5 
16-19 4 0 
65 175 

8. (а) Round off the following numbers to two decimals: 
3.5872 74.168 126.83500 
46.9223 25.193 81.72558 
(b) How many significant figures in each of the following: 

.00046 91.00 1.03 

46.02 18.365 15.0048 


(c) Write the answers to the following: 
127.4 X .0036 = (both numbers approximate) 
200.0 + 5.63 = 5% E “ 
62 X .053 = (first number exact, second approximate) 
364.2 -- 61.596 = 
364.2 — 61.596 — 
V47.86 = 
(18.6)? = 


ANSWERS 
2. 61.5 to 62.5 and 62.0 to 63.0; 174.5 to 175.5 and 175.0 to 176.0; 
7.5 to 8.5and 8.0 to 9.0; 311.5 to 312.5 and 312.0 to 313.0; 


.5 to 1.5 and 1.0 to 2.0 
86.5 to 87.5 and 87.0 to 88.0 


3. Size of Interval No. of Intervals 
5 15 
З ог 4 ог 5 16 or 12 or 10 
10 11 
5 or 10 18 or 9 


1 9 


THE FREQUENCY DISTRIBUTION 


46 
35.5 
3.3 
425.8 
302.6 


6.918 ог 6.92 


346 


:5 to 
162.5 to 
79.5 to 
62.5 to 
15.5 to 
— .5 to 
24.5 to 


167.5 
89.5 


Midpoint 
46.0 


31 


CHAPTER II 


MEASURES OF CENTRAL TENDENCY 


Waen scores or other measures have been tabulated into a 
frequency distribution, as shown in Chapter I, usually the next 
task is to caleulate one or more measures of central tendency, 
Тһе value of a measure of central tendency is twofold. First, 
e measure which represents all of the scores made by 
the group; and as such gives a concise description of the per- 
formance of the group as а whole ; and second, it enables us to 
compare two or more groups in terms of typical performance. 
There are three “ averages" or measures of central tendency in 
common use, (1) the arithmetic mean, (2) the median, and (3) the 
mode. Popularly, the average is the term used for the arith- 
metic mean. In statistical work, however, the term average is 
often used as а general expression to cover any measure of 
central tendency. 


it is a singl 


I. CarcuLATION OF MEASURES OF CENTRAL Тех 


1. The Arithmetic Mean or “Ауегаре” (М) 
(1) Caleulation of the Mean When Data Are Ungrouped 

The arithmetic mean or simply the mean is the best known 
measure of central tendency. It may be defined as the sum of 
the separate scores or other measures divided by their number. 
To illustrate: if a man earns $3, $4, $3.50, $5, and $4.50 on 
five successive days his mean daily wage ($4.00) is obtained 
by dividing the sum of his daily earnings by the number f 
days he has worked. The formula for the arithmetic mea. м 
of а series of ungrouped measures is EM 


ZX 
M == 
da. (1) 
(arithmetic mean calculated from ungrouped data) 
32 


DENCY 


MEASURES OF CENTRAL TENDENCY 33 


in which М is the number of measures in “һе series, X stand 
for a score or other measure, and the symbol У means ne 
of,” here sum of scores. Е x: 
(2) Caleulation of the Mean from Data Grouped into a Fre- 
quency Distribution 

When measures have been grouped into а frequency dis- 
tribution, the arithmetic mean is calculated by a elite, dif. 
ferent method from the one given above. The two ‘ilies owe 
given in Table 5 will make the differences clear. The first 
example shows the calculation of the mean of the fifty Army 
Alpha scores which were tabulated into a, frequency disini. 
tion in Table 1. First caleulate the FX column by multiplyin, 
the midpoint (X) of each interval by the number of scores ( D 
on it; the mean (170.80) is then simply the sum of the fx 
(namely, 8540) * divided by М (50). The use of the midpoint 
for all of the scores within an interval is made necessary by 
the fact that scores grouped into intervals lose their identity 
and must thereafter be represented by the midpoint of that 
particular interval in which they fall. Hence, we multiply 
or "weight" the midpoint of each interval by the frequency 
upon that interval; add the fX and divide by N to obtain the 
mean. Тһе formula may be written 

хх 
М = E (2) 
(arithmetic mean calculated from scores grouped into a fre- 
quency distribution) 


The second example in Table 5 is another illustration of the 
calculation of the mean from grouped data. This frequency 
distribution represents 200 scores made by a group of adults 
upon a cancellation test. Scores have been classified by 
method (B), page 7, into nine class-intervals; and since the 

* The sum 8540 may be written 8540.000 . . *, @.е., to any number of 
significant figures) since each midpoint value (т) is an exact point within 


а score interval, and the f's are exact numbers, Тһе mean (170.80) has 
been carried only to two decimals — the usual standard of accuracy for 


measures of central tendency. 


34 STATISTICS IN PSYCHOLOGY AND EDUCATION 


intervals are four units, the midpoints are found by adding 
one-half of four to the lower limit of each. For example, in 
the first interval, 103.5 + 2.0 = 105.5. Тһе fX column totals 
23,888.0; and N equals 200. Hence, applying formula (2), 
the arithmetie mean is found to be 119.44 (to two decimals). 
In both of the illustrations in Table 5, the M of the scores 
made by the members of a group was found. We may, however, 
use either formula (1) or formula (2) to calculate the ЛГ of a 
number of measurements made upon the same individual. If 
an individual's reaction time to light is measured 100 times, 
and the measures tabulated into a frequency distribution, the 
M is found in exactly the same way in which we compute the 
“average” reaction time to light of 100 different observers. 


2. The Median (Mdn) * 
(1) Caleulation of the Median When Data Are Ungrouped 

When ungrouped scores or other measures are arranged in 
order of size, the median is the midpoint in the series. Two 
situations arise in the computation of the median from un- 
grouped data: (а) when N is odd, and (b) when N is even. То 
consider, first, the ease where № is odd, suppose we have the 
following integral “mental ages" — 7, 10, 8, 12, 9, 11, 7, cal- 
culated from seven performance tests. If we arrange these 
seven scores in order of size 

7 7 8 (9) 10 11 12 

the median is 9.0 since 9.0 is the midpoint of that score which 
lies midway in the series. Calculation is as follows: There are 
three scores above, and three below 9, and since a score of 9 
covers the interval 8.5 to 9.5, its midpoint is 9.0. This is the 
median. : ‹ 


Now if we drop the first score of 7 
7 our series 
Miei contains six 


7 8 9 10 11 19 
and the median is 9.5. Counting three sc 


ores in f = 
ginning of the series, we complete score 9 ron the Не 


(which is 8.5 5 
* The median is also designated as Md, ш; 


MEASURES OF CENTRAL TENDENCY 35 


TABLE 5 


Тнк CALCULATION OF THE MEAN, MEDIAN, лхо СворЕ MODE 
FROM Dara GROUPED INTO A FREQUENCY DISTRIBUTION 


1. Data from Table 1, fifty Army Alpha scores 
Class-interval = 5 


Midpoint Ї fX 

195-199 197 1 197 
190-194 192 2 384 
185-189 187 4 | 748 
180-184 182 5 4 910 
175-179 177 8 20 1416 
170-174 172 10 1720 
165-169 167 6 20 1002 
160-164 162 4 648 
155-159 157 4 | 628 
150-154 152 2 304 
145-149 147 3 441 
140-144 142 1 142 
Р N = 50 8540 

- №/2 = 25 
хх 
(1) Mean = ae = 


(2) Median = 169.5 + s 172.00 
(3) Crude Mode falls on class-interval 170-174 or at 172.00 


x 
e 
И 


2. Scores made by 200 adults upon a cancellation test 
Class-interval — 4 


Class-Intervals Midpoint З 
Scores x f fX 
135.5 to 139.5 137.5 3 412.5 
131.5 to 135.5 133.5 5 667.5 
127.5 to 131.5 129.5 16 | 2072.0 
123.5 to 127.5 195.5 23 2886.5 
119.5 to 123.5 121.5 52 99 6318.0 
115.5 to 119.5 117.5 ww 5757.5 
111.5 to 115.5 113.5 27 52 3064.5 
107.5 to 111.5 109.5 18 + 1971.0 
103.5 to 107.5 105.5 „ | _ 738.5 

М = 200 23888.0 


(2 Median = 115.5 + $$ X 4 = 119.42 — 
(8) Crude Modo ects + 28 X “interval 119.5 to 123.5 or at 121.50 


(1) Mean = ИХ = 28.8880 _ 11944 


36 STATISTICS IN PSYCHOLOGY AND EDUCATION 


to reach 9.5, the upper limit of score 9. In like manner, count- 
ing three scores in from the end of the series, we move through 
score 10 (10.5 to 9.5) reaching 9.5, the lower limit of score 10. 

A formula for finding the median of a series of ungrouped 
scores is 


N 
Median = the Wt dn measure in order of size (3) 
(median from ungrouped data) 
In our first illustration above, the median is on the er 


or fourth score counting in from either end of the series, that 
is, 9.0 (midpoint 8.5 to 9.5). In our second illustration, the 


(6 4- 1) 
2 


median is on the or 3.5th score in order of size, that is, 


9.5 (upper limit of score 9, or lower limit of score 10). 


(2) Caleulation of the Median When Data Are Grouped into a, 
Frequency Distribution = 

When scores in а continuous series are grouped into a fre- 
quency distribution, the median by definition is the 50% point 
in the distribution. To locate the median, therefore, we take 
50% (1.е., N/2) of our scores, and count into the distribution 
until the 50% point is reached. The method is illustrated in 
the two examples in Table 5. Since there are fifty scores in the 
first distribution, №2 = 25, and the median is that point in our 
distribution of Army Alpha scores which has twenty-five scores 
on each side of it. Beginning at the small-score end of the 
distribution, and adding up the scores in order, we find that 
intervals 140-144 to 165-169, inclusive, contain just 20 fs — 
five Scores short of the twenty-five necessary to locate the 
median. The next interval, 170-174, contains ten scores as 
sumed to be spread evenly over the interval (р. 8). In wd А 
to get the five extra scores needed to make exactly twenty. € 
we take 5/10 x 5 (the length of the interval) and ааа Er Mol 
ment (2.5) to 109.5, the beginning of the interval 170-174, 


^ 


MEASURES OF CENTRAL TENDENCY 37 


This puts the Mdn at 169.5 + 2.5 or at 172.0. The reader 
should note carefully that the median like the mean is a point 
and not a score. 

А second illustration of the caleulation of the median from 
data grouped into a frequency distribution is given in Table 5 (2). 
There are 200 scores in this distribution; hence, №/2 = 100, 
and the median must lie at a point 100 scores distant from 
either end of the distribution. If we begin at the small- 
score end of the distribution (103.5 to 107.5) and add the 
scores in order, fifty-two scores take us through the interval 
111.5 to 115.5. Тһе 49 scores on the next interval (115.5 to 
119.5) plus the fifty-two already counted off total 101 — 
one score too many to give us 100, the point at which the median 
falls. То get the forty-eight scores needed to make exactly 100 
we must take 48/49 X 4 (the length of the interval) and add this 
amount (3.92) to 115.5, the beginning of interval 115.5 to 
119.5. "This procedure takes us exactly 100 scores into the dis- 
tribution, and locates the median at 119.42. 

A formula for calculating the Mdn when the data have been 
classified into a frequency distribution is 


5-Е 
Man - EAT i (4) 


(median computed from data grouped into a frequency distribution) 


where 2 . 
1 = lower limit of the class-interval upon which the 


median lies 
т = one-half the total number of scores 
F = sum of the scores on all intervals below 1 
fm = frequency (number of scores) within the interval 
upon which the median falls 
i = length of the class-interval 
To illustrate the use of formula (4), consider the first example 
in Table 5. Неге | = 169.5, N/2 = 25, F = 20, fm = 10, and 


38 STATISTICS IN PSYCHOLOGY AND EDUCATION 


(25 — 20) 

72.0. In the second example, | = 115.5, №/2 = 100, F = 52, 
E (100 — 52) 
m 


1-5. Hence, the median falls at 169.5 + 


fm = 49, and = 4. The median, therefore, is 115.5 + 


Х 4 or 119.42. 
'The steps involved in computing the Mdn from data tabu- 
lated into a frequency distribution may be summarized ag 
follows: 
(1) Find N/2, that is, one-half of the cases in the distribution. 
(2) Begin at the small-score end of the distribution and count 
^. ой the scores in order up to the lower limit (l) of the in- 
terval which contains the median. "Тһе sum of these Scores 
is F. 
Compute the number of scores necessary to fill out №/2, 
i.e., compute №/2 — F. Divide this quantity by the һе- 
quency (fm) on the interval which contains the 
and multiply the result by the size of the class 
Add the amount obtained by the calculations 
lower limit (/) of the interval which contain 
This will give the median of the distribution. 


(3 


= 


median; 
-interva] (2). 
in (3) to the 
15 the Мал. 


(4 


= 


Тһе median may also be computed by adding up one-half of 
the scores from the top down in a frequency distribution. Тһе 
procedure is the same through Step (3) in th 
When we count down from the top of the dis 
the quantity found in step (3) must be subira 
limit of the interval containing the median. 
the data of Table 5 (1), counting down in the 
Scores complete interval 175-179, and we reac 
limit of the interval 170-174. Five scores 


interval are needed to make twenty-five (N /9 
1745  & x5- 


median. In Table 5 (2), the 


MEASURES OF CENTRAL TENDENCY 39 


(3) Calculation of the Mdn When (a) the Frequency Dis- 
tribution Contains Gaps; and When (b) the First or Last 
Interval Has Indeterminate Limits 

(a) Difficulty arises when it becomes necessary to calculate 
the median from a distribution in Which there are gaps or 
zero frequency upon one or more intervals. The method to be 
followed in such cases is shown in Table 6. Since № = 10, and 

N/2 = 5, we count up the frequency column five scores through 

6-7. Ordinarily, this would put the median at 7.5, the lower 

limit of interval 8-9. If we check this median, however, by 

counting down the frequency. column five Scores, the median 
falls at 11.5, the lower limit of 12-13. Obviously, the dis- 
crepancy between these two values of the median is due to the 
two intervals 8-9 and 10-11 (each of which has zero frequency) 
Which lie between 6-7 and 19-13. In order to have the median 
come out at the same point, whether computed from the top 
or the bottom of the frequency distribution, the procedure 
usually followed in cases like this is to have interval 6-7 in- 
clude 8-9, thus becoming 6-9; and to have interval 12-13 in- 
clude 10-11, becoming 10-13. Lengthening these intervals 


TABLE 6 


Computation ог THE MEDIAN. WHEN THERE ARE GAPS 
IN THE DISTRIBUTION 


Class-Intervals 
Scores 
20-21 
18-19 
16-17 
14-15 
12-13 
10-11 

8-9 
6-7 


) 10-13 


= 
әсіныммоскоосовь = 


Mdn = 9.5 +$ X 2 = 9.5 


40 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Я its eliminates the zero frequency on the 

Vs er Lodel hw NE the numerical frequency over 
Ed um we count off five scores, going up the Heneng 
о ian falls at 9.5, the upper 
t cs arb 2. the frequency column five 
bas eru Жақа. а median value of 9.5, the upper limit of 
rog qoem limit of 10-13. Computation from the two 
о, о the series now gives consistent results — the median 
е 5 Б 

Y 202. widely, the last class-interval in a 
fr (0) b “distribution may be designated as “80 and above” ор 
m This means that all scores above 80 are thrown 
ad “ag 5 i the upper limit of which is indeterminate, 
е T seien together of scores may also occur at the be- 
abe x Sn fim Bistribuldon, when the first interval, for exa. 
RE : ted “20 and below" or 20 —. Тһе lower limit of the 
b od nm class-interval is now indeterminate. In irregular 
debut like these, the median is rendily computed since 
each score is simply counted as one frequency whether ac 
classified or not. But it is impossible to calculate tl 
exactly when the midpoint of one or more intervals isu 
The mean depends upon the absolute size of the s 
their midpoints) and is directly affected by inde 
interval limits. 


mple, 


curately 
пе mean 
nknown. 
cores (or 
terminate 


3. The Mode 


In a simple ungrouped series of measures the “crude” or 
" empirical" mode is that single measure or score which occurs 
most frequently. For example, in the series 10, 11, 11, 12, 12, 
13, 13, 13, 14, 14, the most often recurrin 

is the crude or empirical mode. When data are grouped into a, 
frequency distribution, the crude mode is usually taken to be 
the midpoint of that interval which contains the largest, fre- 
quency. In example 1, Table 5, the interval 170-174 contains 
the largest frequency and hence 172.0, its midpoint, is the 
crude mode. In example 2, Table 5, the largest frequency 


8 Measure, namely 13, 


MEASURES OF CENTRAL TENDENCY 41 


falls on 119.5 to 123.5 and the crude mode is at 121.5, the 
midpoint. | 
When calculating the mode from a frequency distribution, 
we distinguish between the “true” mode and the crude mode. 
The true mode is the point (or *peak") of greatest concen- 
tration in the distribution; that is, the point at which more 
measures fall than at any other point. When the scale is 
divided into finely graduated units, when scores are recorded 
exactly, and when N is large, the crude mode closely approaches 
the true mode. Ordinarily, however, the crude mode is only 
approximately equal to the true mode. А formula for approxi- 
mating the true mode, when the frequency distribution is 
symmetrical, or at least not badly skewed (р. 119) is 


Mode = 3 Mdn — 2 Mean (5) 
(approximation to the true mode calculated. from a frequency 
distribution) 


If we apply this formula to the data in Table 5, the mode 
is 174.40 for the first distribution, and 119.38 for the second. 
'The first mode is somewhat larger and the second slightly 
smaller than the crude modes obtained from the same dis- 
tributions. 

The crude mode is often an unstable measure of central tend- 
ency. This instability is not, however, so serious a drawback 
as might seem at first glance. The crude mode is usually em- 
ployed as a simple, inspectional "average," to indicate in a 
rough way the center of concentration in the distribution; and 
for this purpose it need not be calculated as exactly as the 
median and mean. 


II. CALCULATION OF THE MEAN BY THE “ASSUMED 
Мвах” ов Зновт METHOD 


In Table 5 the mean was calculated by multiplying the mid- 
point (X) of each interval by the frequency (number of scores) 
on the interval, summing up these values (the fX column) and 


42 STATISTICS IN PSYCHOLOGY AND EDUCATION 


dividing by N, the number of Scores. This straightforward 
method (called the Long Method) gives accurate nine pat 
often requires the handling of large numbers and ешш = 
calculation. Because of this, the “Assumed Men” me hod, 
or simply the Short Method, has been devised for computing 
the mean. The Short Method does not apply to the calcula- 
tion of the median or the mode. These measures are always 
found by the methods previously described. | 

'The most important fact to remember in calculating the mean 
by the Short Method is that we “guess” or assume” a mean 
at the outset, and later apply а correction to this assumed 
value (AM) in order to obtain the actual mean (M) (see 

Table 7). There is no set rule for assuming а mean.* The best 

plan is to take the midpoint of an interval somewhere near the 

center of the distribution; and if possible the midpoint of that 
interval which contains the largest frequency... In Table 7, the 
largest f is on interval 170-174, which also happens to be al- 
most the center of the distribution. Hence the AM is taken 
at 172.0, the middle of this interval. When the question of the 

AM is settled, we determine the correction Which must be 

applied to the AM in order to get M. Steps are as follows: 

(1) First, we fill in the 2’ column,f column (4). Here are еп- 
tered the deviations of the midpoints of the different, Steps 
measured from the AM in units of class-interval. Thus 177, 
the midpoint of 175-179, deviates from 172, the AM, by 
one interval; and а “1” is placed in the x’ column Opposite 
177. In like manner, 182 deviates two intervals from 172; 
and а “2” goes in the 2 column opposite 182. Reading on 
up the 2” column from 172, we find the Succeeding entries 
tobe3,4, and 5. The last entry, 5, is the interval-deviation 


of 197 from 172; the actual Score-deviation, of binge, ЧЫ 
25. › 


iation of 4 Score X from the 
mean (Л) of the distribution. a score X from the actual 


й 


(2) 


MEASURES OF CENTRAL TENDENCY 43 


Returning to 172, we find that the x’ of this midpoint 
measured from the AM (from itself) is zero; hence a zero 
is placed in the x’ column opposite 170-174. E 


all of the 2” entries are negative, since all of the midp 
are less than 172, the AM. So the x’ of 167 from 174 i 
— L interval; and the 2’ of 162 from 172 is — 2 intervals: 
The other х'з are — 3, — 4, — 5, and — 6 intervals. 

The x’ column completed, we compute the fx’ column, 
column (5). The fx’ entries are found in exactly the same 
way as are the fX in Table 5, page 35. Each 2x’ in column 
(4) is multiplied or “weighted” by the appropriate f in 
column (3). Note again that in the Short Method we 
multiply each 2’ by its deviation from the AM in units of 
class-interval, instead of by its actual deviation from the 
mean of the distribution. For this reason, the compu- 
tation of the fx’ column is much more simple than is the 
caleulation of the fX column by the method given on 
page 33. АП of the fx’ on intervals above (greater than) 


TABLE 7 


Tue ÇALCULATION оғ THE MEAN nv THE SHORT METHOD 
` (Data from Table 1, fifty Army Alpha scores) 


a) (2) (3) (4) (5) 

Class-Intervals Midpoint , , 
Scores X f а fx 
195-199 197 1 5 5 
190-194 192 2 4 8 
185-189 187 4 3 12 
180-184 182 5 2 10 
175-179 177 8 d 8 

170-174 172 10 + 43 
“165-169 167 6 = zd 
160-164 162 4 =2 - 8 
155-159 157 4 =8 wi 
150-154 152 2 - 4 ыш 8 
145-149 147 3 = ET 
140-144 142 d - 6 = 6 

М = 50 — 55 


| с 


“АМ = 172.00 c 
сі = - 1.20 ә 
M = 170.80 сі = 


44 


(3) 


STATISTICS IN PSYCHOLOGY AND EDUCATION 


the АМ are positive; and all fx’ on intervals below (smaller 
than) the АМ are negative, since the signs of the fz' depend 
upon the signs of the 27. 

From the fz’ column the correction is obtained as follows: 
The sum of the positive values in ће fx’ column is 43; 
and the sum of the negative values in the fz' column is 
— 55. There are, therefore, 12 more minus fx’ values than 
plus (the algebraie sum is — 12); and — 12 divided by 50 
(№) gives —.240 which is the correction (c) in units of 
class-interval. If we multiply с (—.240) by 7, the length of 
the interval (here 5), the result is c? (— 1.20) the score cor- 
rection, or the correction in score units. When — 1.90 is 
added to 172.00, the AM, the result is the actual mean, 
170.80. 


The process of calculating the mean by the Short Method 
may be summarized as follows: 


а) 
(2) 


Tabulate the scores or measures into a frequency distribu- 
tion. 

“Assume” a mean as near the center of the distribution as 
possible, and preferably on the interval containing the 
largest frequency. 

Find the deviation of the midpoint of each class-interval 
from the АМ in units of interval. 

Multiply or weight each deviation (>) by its appropriate 
f — the f opposite it. 

Find the algebraic sum of the plus and minus fe’ and 
divide this sum by Х, the number of cases. 'This gives c 
the correction in units of class-interval. A 
Multiply c by the interval length (7) to got ci, the score 
correction. 

Add сё algebraically to the АМ to get t 
Sometimes сё will be positive and someti 
pending upon where the mean has bee 
method works equally well in either case, 


he actual mean. 
mes negative, de- 
п assumed. Тһе 


MEASURES OF CENTRAL TENDENCY 45 


ПІ. WHEN то USE THE VARIOUS MEASURES 
OF CENTRAL TENDENCY 
The beginning student of statistics is often puzzled to know 
which measure of central tendency to use in a given problem. 
The following summary will serve as a convenient guide for 
most statistical work. 
1. Use the mean 
(1) When each score or measure should have equal weight 
in determining the central tendency. Since the mean is 
the sum of the scores divided by their number, each 
score has equal weight in its determination. 
(2) When the measure of central tendency having the 
highest reliability is desired. (p. 193) 
(3) When standard deviations and product-moment co- 
cfficients of correlation are to be subsequently com- 
puted. (p. 282) 


2. Use the median 

(1) When a quick and easily computed measure of central 
tendency is wanted. 

(2) When there are extreme measures which would affect 
the mean disproportionately (p. 39). 

(3) When it is desired that certain scores should influence 
the central tendency but all that is known about them 
is that they are above or below the median (p. 40). 


3. Use the mode 
(1) When the most often recurring score is sought. 
(2) When a quick approximate measure of concentration is 
all that is wanted. 


46 


STATISTICS IN PSYCHOLOGY AND EDUCATION 


PROBLEMS 


1. Calculate the mean, median, and mode for the following frequency 


2. 


distributions. Use the Short Method in computing the mean. 


(1) Scores f (2) Scores F 
70-71 2 90-94 2 
68-69 2 85-89 2 
66-67 3 80-84 4 
64-65 4 75-79 8 
62-63 6 70-74 6 
60-81 =- -7 65-69 11 
58-59 5 60-64 Y 
56-57 — 4 55-59 7 
54-55 2 50-54 5 
52-53 3 45-49 0 
50-51 1 40-44 2 

М = 39 № = 56 

(3) Scores 7 (4) Scores f 
120-122 2 100—109 5 
117-119 2 90-99 9 
114-116 2 80-89 14 
111-113 4 70-79 19 
108-110 5 60-69 21 
105-107 9 50-59 30 
102-104 6 40-49 95 

99-101 3 30-39 15 
96-08 4 20-29 10 
93-95 2 10-19 8 
90-92 1 0-9 6 

№ = 20 № = 16 


Compute the mean and the median for each of the two distribu- 
tions in problem 5(a), page 28, tabulated in three- and 
intervals. Compare the two means and the two medians 
Шаш any discrepancy found. (Let the first interval in the 
ribution be 61-63: st 1 i i 

іг 63; the first interval in the Second dist; 


five-unit 

and ex- 
first dis- 
ribution, 


э 


и 


MEASURES OF CENTRAL TENDENCY 47 


3. (а) Compute the median of the following sixteen scores: 


Scores 
20 to 22 
18 to 20 
16 to 18 
14 to 16 
12 to 14 
10 to 12 
8 to 10 
бю8 
4406 
204 
0 to 2 


FOOCOKRCORONWNS 


N = 16 

(b) In a group of fifty children, the eight children who took longer 
than five minutes to complete a performance test were marked 
D.N,C. (did not complete). In computing a measure of central 
tendency for this distribution of scores, what measure would you 
use, and why? 

(c) Find the medians of the following arrays of ungrouped scores: 
(1) 21, 24, 27, 29, 29, 30, 32, 33, 35, 38, 42, 45. 
(2) 54, 59, 64, 67, 70, 72, 73, 75, 78, 83, 90. 
(3) 7, 8, 9, 9, 10, 11. 

4. The time by your watch is 10:31 o'clock. In checking with two 
friends, you find that their watches give the time as 10:25 and 10:34. 
Assuming that the three watches are equally good timepieces, what 
do you think is probably the “correct time"? 

5. What is meant popularly by the “Jaw of averages"? 

6. (a) When one uses the term “іп the mode" does he have reference 

to the mode of a distribution? 

(b) What is approximately the modal time for each of the following 
meals: breakfast, lunch, dinner. Explain your answers. 

(c) Why is the median usually the best measure of the typical con- 
tribution in a church collection? 


í 


48 STATISTICS IN PSYCHOLOGY AND EDUCATION 


1. (1) Mean = 00.76 


Median — 60.79 

Mode = 60.85 
(3) Mean = 106.00 
Median — 105.83 
Mode = 105.49 

2. Class-interval = 3 

Mean = 72.9 
Median = 71.7 

3. (а) Median = 11.5 
(с) (1) Median = 31.0 
(2) Median — 72.0 
(3) Median — 
4. Mean is 10:30. y 


2 
5 


ANSWERS 


(2) Mean = 67.36 
Median = 66.77 

Mode = 65.59 

(4) Mean = 55.43 

Median — 55.17 

Mode = 54.65 

Class-interval = 5 

Mean = 73.00 
Median = 72.71 


` 


CHAPTER ПІ 
MEASURES OF VARIABILITY 


In Chapter II the calculation of three measures of central 
tendency — measures typical or representative of a set of 
Scores as а whole — was described. Ordinarily, the next step 
is to find some measure of the variability of our scores, i.e., of 
the “scatter” or “spread” of the separate scores or еннен 
around their central tendency. It will be the task of this 
chapter to show how measures of variability may be computed. 

Тһе usefulness of а measure of variability can be seen from 
а simple example. Suppose a test of controlled association has 
been administered to а group of fifty boys and to a group of 
fifty girls. Тһе mean scores are, boys, 34.6 seconds, and girls, 
34.5 seconds. бо far аз the means go there is no difference in 
the performance of the two groups. But suppose the boys' 
scores are found to range from 15 to 51 seconds and the girls’ 
scores from 19 to 45 seconds. This difference in range shows 
that in a general way the boys “cover more territory,” are 
more variable, than the girls; and this greater variability may 
be of more interest than the lack of a difference in the means. 
If a group is homogencous, that is, made up of individuals of 
nearly the same ability, most of the scores will fall around the 
same point on the scale, the range will be relatively short, and the 
variability will be small. But if the group contains individuals 
of widely differing capacities, scores will be strung out from high 
to low, the range will be relatively wide, and the variability large, 

This situation is represented graphically in Figure 7, which 
shows two frequency distributions of the same area (N) and 
same mean (50) but of very different variability. Group A 
ranges from 20 to 80, and Group B from 40 to 60. Group A 
is three times as variable as Group B— spreads over three 
times the distance on the scale of scores — though both dis- 


tributions have the same central tendency. 
49 


50 STATISTICS IN PSYCHOLOGY AND EDUCATION 


20 30 40 50 60 70 80 


Fro. 7. Two Distributions of the Same Area (№) and Mean (50) 
but of Very Different Variability. 

Four measures have been devised to indicate the variability 
or dispersion within a set of measures. These are (1) the range, 
(2) the quartile deviation or Q, (3) the mean deviation or MD 
and (4) the standard deviation or SD. Ў 


І. CALCULATION OF MEASURES OF VARIABILITY 


1. The Range 


In grouping the scores in Table 1 into a frequency distribu. 
tion (p. 6) we have already had occasion to use the range. 
It may be redefined simply as the interval between the largest 
and the smallest scores. In the illustration above, the range of 
boys’ scores was 51-15 or 36 seconds and the range of girls’ 
Scores 45-19 or 26 seconds. "Тһе range is the most general 
measure of spread or scatter, and is computed when we wish 
to make a rough comparison of two or more groups for varia- 
bility. Since the range takes account of the extremes of the 
series only it is unreliable when N is small or when many or 
large gaps (i.e. zero f's) occur in the frequency distribution. 


2. The Quartile Deviation or Q 


The quartile deviation or Q is one-h 
5 -half of the dist; 
tween the 75th and 25th percentiles in a mde гаа 
ution. 


MEASURES OF VARIABILITY 51 
The 25th percentile, called Qi, is the first quarter or quartile 
on the seale of scores, the point below which lie 25% of the 
scores. Тһе 75th percentile, or Qs, is the ¿hird quarter or 
quartile on the score-seale, the point below which lie 75% of 
the scores.* 

'To find Q, we must first caleulate the 75th and 25th per- 
centiles. These values are found by exactly the same method 
employed in caleulating the median. То find Qi, count off 
25% of the scores from the beginning of the distribution (low 
end); and to find Q; count off 75% of the scores from the low 
end of the distribution, or 25% from the high end. 

Table 8 illustrates the calculation of О for the distribution of 
fifty Alpha scores tabulated in Table 1. First, to find Q, 
count off 1/4 of N (12.5) from the low-score end of the distri- 
bution. When the scores (f) are added in order, the first four 
class-intervals (140—144 to 155-159, inclusive) are found to con- 
tain 10 scores. "Тһе next interval, 160-164, contains four scores, 
assumed to be spread evenly over the interval. Since we need 
only 2.5 additional scores to make up the necessary 12.5, take 
2.5/4 X 5 (the interval) and add this amount, 3.13, to 159.50, 
the beginning of the interval which contains Qi. This calcula- 
tion locates Q, at 162.63 (see Table 8). 

О; is found in the same way by counting off 3/4 of М (37.5) 
from the small-score end of the distribution. The f's on 140- 
144 to 170-174, inclusive, added in order, total 30. "The next 
interval, 175-179, contains eight scores. То make up the neces- 
sary 37.5, therefore, take 7.5/8 X 5 (interval) and add this 
amount (4.69) to 174.50. This puts Q; at 179.19 (see Table 8). 

When О, апа Q; are known, Q, the quartile deviation, is found 
from the formula 
_ Va Qi (6) 
e 2 

(quartile deviation calculated from grouped data) 


179.19 — 162.63 
In the present problem, 0 = — ——3-—— 91 8.28. 


* It may be noted that the second quartile, Q», is the median. 


52 STATISTICS IN PSYCHOLOGY AND EDUCATION 


A second illustration of the calculation of Q from a frequency 
distribution is given in Table 8, example 2. Since the N of 
this distribution is 200, 1/4 of N equals 50. Тһе intervals 
103.5 to 107.5 and 107.5 to 111.5 contain twenty-five scores; 
and the next interval, 111.5 to 115.5, contains twenty-seven 
scores, which makes a total of fifty-two — two more than the 
fifty wanted. То find the point reached by just fifty scores, 
take 25/27 X 4 (the interval) and add this amount (3.70) to 
111.50, the lower limit of 111.5 to 115.5. This locates Q, at 
115.20. 

To find Оз count off 3/4 of N or 150 scores from the small- 
score end of the distribution. The first four intervals include 
101 scores, and the next interval, 119.5 to 123.5, contains fifty- 
two scores. To fill out the required 150, take 49/52 x 4, the 
length of the interval, and add this increment (3.77) to 119.50, 

to locate Qs at 123.27. Substituting 115.20 for Q, апа 123.27 
for Qs in formula (6) we get a Q of 4.04. 


TABLE 8 
Tue CALCULATION or тне Q, MD, ann SD FROM DATA G; 
INTO A FREQUENCY DISTRIBUTION HOUEBD 


1. Data from Table 1, fifty Army Alpha Scores 
(1) (2) (3) (4) (5) 


(6) 
Class-Intervals Midpoint 
cores x / Б fe fe 
195-199 197 1 26.20 26.20 

190-194 192 2 21.20 42.40 500.44 
185-180 187 4 16.20 64.80 104976 
180-184 182 5 11.90 56.00 69720 
175-179 177 8 6.20 49.00 80752 
170-174 172 10 30 1.20 12.00 14.40 

165-169 16%. б — 3.80 — 2280 86. 
160-164 162 4 —880 — 35.20 30004 
155-159 157 4 10 — 13.80  — 5520 е 
150-154 152 2 -1880 - 3760 76176 
145-149 147 3 — 2380 —7[ 706.88 
140-144 142 1 — 2880 - 2540 169932 
М =50 d 829.44 


^ 


MEASURES OF VARIABILITY 


TABLE 8 (continued) 
Mean = 170.80 (Table 5, p. 35) 


3N 


N + 

q = 12.5, and, T” 37.5, and, 

@ = 159.5 + ^? X 5 = 162.63 Qs = 174.5 + 25 X 5 = 17919 
Ox Q; — 7% 179.19 — 162.63 = 838 


Мр = ЕТ 2 502.00 _ 


2. Data from Table 3, p. 14, 200 cancellation Soores 


(1) (2) (3) (4) 

Class-Intervals Midpoint 

Scores Ж f т 
135.5 to 139.5 Di 5 3 18.06 
131.5 to 135.5 133.5 5 14.06 
127.5 to 131.5 129.5 16 10.06 
123.5 to 127.5 5.5 23 6.06 
119.5 to 123.5 120. 5 52 2.06 
115.5 to 119.5 117.5 49 101 — 1.94 
111.5 to 115.5 113.5 27 — 5.04 
107.5 to 111.5 109.5 18 25 — 9.94 
108.5 to 1075 1055 7 — 13.94 

N = 200 
, Mean = 119.44 (Table 5) 
3N " 
т = 50, апа, g = 150, and, 


Qı = 111.5 + 32 X 4 = 115.20 


Q- %- = Q, _ 123.27 — 115.20 |0, 


2 
|| 2 | _ 1063.88 _ 


2% = [8927.29 29 
200 


= 6.68 


(5) (6) 


Qs = 119.5 + 3$ X 4 = 123.27 


The quartiles Q, and Q; mark off the limits of the middle 
‚50% of scores in the distribution and the distance between 
these points is called the interquartile range. 


Q is one-half the 


53 


ГА 


е ICATION 
54 STATISTICS IN PSYCHOLOGY AND EDUCATION 


; 50%, ог semt-interquartile range. Since 
ee с 227 of the quartile points Yo 
Q measures * Si goad. mensum ОЁ seare density around the 
the a distribution. If the scores of a distribution are 
pec 5 i ба ether the quartiles will be near to one an- 
pase? aa ‘vill jm small; if the scores are widely scattered, 
[ied ‘will be relatively far apart, and Q will be large 
ë ; 
— ee е РД is asymmotrical or “skewed,” 0, апа 
When the wd distances from the median, and the difference 
Q: are at ae and (Mdn — Qı) gives a measure of the 
between Q: drakon of the skewness (p. 119). When the dis- 
amount a metrical or normal, Q marks off exactly 
tribution = ie above, and the 25% of cases just below, the 
25% of ig vedi Шеп lies just halfway between the two 
pei О and 0). In a normal distribution Q is commonly 
Sin BE tha PE (probable error). The terms Q and РЕ 
often used interchangeably, but it is best to restri 
the term РЁ to the measurement of reliability 
Steps in calculating Q may be summarized as 


To find 0, 
ivide N by 4. 
Ы тағын the low-score end of the distribution, 
scores up to the interval which contains О. | 
(3) Divide the number of Scores necessary to locate Q, Ge., to com- 
plete N/4) by the frequency in the interval reached in (2) above, 
and multiply the result by the class-interval. | 
(4) Add the amount obtained in (3) to the lower lir 
interval within which Q; lies. This gives Qu. 


To find 0% 
(1) Find 3/4 of N. 
(2) Begin at the low-score* 
the scores until the interv 


the 


are 
ct the use of 
(p. 187), 
follows: 


and count off the 


nit of the class- 


end of the distribution, 
val which contains Q; is т 
* Qs may also be found by counting in 259, from th 

the distribution, To avoid confusion Y A 


high-score end of 
; the method рї is r - 
mended to the beginner. Even above is recom 


and count up 
reached. 


MEASURES OF VARIABILITY > 55 


(3) Divide the number of scores required to locate Q; by the fre- 
queney within the interval reached in (2) and multiply the result 
by the class-interval. 

(4) Add the amount obtained in (3) to the lower limit of the class- 
interval within which Q; Нез. This gives Qs. 

To find Q 
Substitute Q; and Qı in formula (б). 


3. The Mean Deviation or MD 
(1) Caleulation of ALD from Ungrouped Data 

"The mean deviation or MD (also written average deviation or 
AD and mean variation or МУ) is the mean of the deviations 
of all the separate measures in a series taken from their central 
tendency (usually the arithmetic mean; less frequently the 
median or mode). In averaging deviations to find the AMD, no 
account is taken of signs, and all deviations whether positive 
or negative are treated as positive. 

An example will make our definition clearer. If we have 
five scores, 6, 8, 10, (2, and 14, the mean is easily found to be 
10. It is then a simple process to find the deviation of each 
measure from this mean by subtracting the mean from each 
measure. Thus 6, the first score, minus 10 equals — 4; 
8—10=-2; 10-10-0; 12—10=2; and 14— 10 = 4. 
Тһе five deviations measured from the mean are — 4, — 2, 0, 
2, and 4. If we add these deviations without regard to signs 
the sum is 12; and dividing 12 by 5 (N), we get 2.4 as the 
mean of the five deviations from their mean, or the MD. The 
formula for the MD when scores are ungrouped may be written 


MD == ја | (7) 


(mean deviation for ungrouped measures) 


in which ех | x |denotes the sum of the deviations from the 
mean and М is, as before, the number of cases or items. The 
bars || enclosing т mean that signs are disregarded. The 


56 STATISTICS IN PSYCHOLOGY AND EDUCATION 


small letter = in the formula always represents the deviation 
of а score X from its mean M, i.e., v = X — M. 


(2) Caleulation of МР from Grouped Data | 

In Table 8 the calculation of the MD for scores grouped into 
а frequency distribution is illustrated by two problems. Тһе 
mean of the fifty Army Alpha scores in problem 1 has already 
been found in Table 5 to be 170.80. To compute the MD of 
the scores in this distribution we must take our deviations 
(z's) around this mean. However, since the scores have been 
grouped into class-intervals, we are unable to get the deviation 
of each separate score from the mean. In lieu of separate score 
deviations, therefore, we take the deviation of the midpoint of 
each interval from the mean. The substitution of the mid- 
point for all of the scores within an interval is the only dif- 
ference between the computation of z's from grouped and from 
ungrouped data. Тһе т of 195-199, for example, is 20.20, 
found by subtracting 170.80 (the mean) from 197.00 (the mid- 
point of the interval). АП of the z's are positive as far down 
as 170-174, as in each case the midpoint is numerieally larger 
than the mean. From the interval 165-169 on down to the 
beginning of the series, the x’s are negative, as the midpoints 
of these intervals are all smaller than 170.80. Thus the x of 
interval 165-169 is — 3.80; and the = of the lowest interval jn 
the distribution, 140-144, is — 28.80. 

It will be helpful in calculating deviations from the mean to 
remember that the mean is always subtracted from the indi- 
vidual score or midpoint value. That is, x (deviation) = X 
(score or midpoint) — (mean). The calculation is algebraic, 
When the score or midpoint is numerically larger than the mean 
the deviation is positive; when the score or midpoint is nu- 
merically smaller than the mean the deviation ig negative, 

Column (4) Table 8, gives the deviation of each class-interval, 
ав represented by its midpoint, from the mean of the dis- 
tribution. "There are more scores оп some intervals than on 
others; hence each midpoint deviation in column (4) must be 


MEASURES OF VARIABILITY 57 


"weighted" or multiplied by the number of scores (f) which 
it represents. This gives the fx column, column (5). The first 
Jx is 26.20; for, since there is only one score on 195-199, we 
multiply the first z by 1. The next fx is 42.40, since each of the 
two scores on 190-194 has an x of 21.20. In the same way we 
obtain the other fr's by multiplying, in each case, the x in 
column (4) by its corresponding f in column (3). When all of 
the fz's have been calculated, the column is added without 
regard to sign, and the resulting sum is divided by N to give 
the MD. In the present problem the MD equals 502.00/50 
or 10.04. 

The formula for the MD when measures are grouped into 

а frequency distribution is as follows: 

EX 
MD = TRO (8) 


(mean deviation for scores grouped into a frequency distribution) 


The second problem in Table 8 shows the calculation of the 
MD for 200 cancellation scores grouped into a frequency dis- 
tribution in class-intervals of four. The mean of this dis- 
tribution was found to be 119.44 (Table 5). Hence, the x of 
the topmost interval, 135.5 to 139.5 (midpoint 137.50), from 
the mean is 18.06. Since the class-interval is constant in size, 
the next x may be found by subtracting 4 (the interval) from 
18.06; and each succeeding z may be found by subtracting 4 
from the 2 just preceding it. 

The fz’s in column (5) are found, as shown in problem 1, by 
weighting each т by the f which it represents — by the f oppo- 
site it. The sum of the fx column is 1063.88; and, since М is 
equal to 200, from formula (8) we obtain 5.32 as the MD of 
the scores in this distribution around their mean of 119.44. 

In а symmetrical or norma! distribution the MD, when 
measured off on the scale above and below the mean, marks 
the limits of the middle 57.5% of the measures. The MD is 
always slightly larger, therefore, than the Q which marks off 
the limits of the middle 50%. А large MD means that the 


58 STATISTICS IN PSYCHOLOGY AND EDUCATION 


scores of the distribution tend to scatter widely around the 
central tendency; a small WD that they tend to be сопсеп- 
trated within a relatively narrow range. 


4. The Standard Deviation or SD 

The standard deviation or SD is the measure of variability 
customarily employed in research. The SD differs from the 
MD in several respects. In calculating the MD we disregard 
signs and treat all deviations as positive; in finding the SD we 
avoid this difficulty of signs by squaring the separate deviations, 
Again, the squared deviations used in computing the SD are 
always taken from the mean of the distribution, and never 
from the median or mode. The conventional symbol used to 
denote the SD is the Greek letter sigma (е). 
(1) Calculation of SD from Ungrouped Data 

The standard deviation or c is the square root of the mean 
of the squared deviations taken from the arithmetical mean of 
the distribution: To illustrate the calculation of the SD ina 
simple ungrouped series, let us consider the example given n 
page 55, to illustrate the calculation of the MD, in which the 
deviations of the five measures, 6, 8, 10, 12, and 14 from their 
mean of 10 were found to be — 4, — 2, 0, 2, and 4, respectively. 
Squaring each of these deviations, we obtain 16, 4, 0, 4, and 16. 
Summing these five squares and dividing by five, we obtain 
the mean of the squares, and, extracting the square root get 
2.83, the SD of this series. The formula for the SD or ¢ han 
the series of scores is ungrouped is as follows: 


m үш (9) 


(standard deviation calculated from ungrouped data) 
(2) Caleulation of SD from Grouped Data 
Table 8 illustrates the calculation of 
: с whe T 
grouped into a frequency distribution. Тһе Bess wee 
with that used for ungrouped items, except that in e ii 
Squaring the x of each midpoint from the mean n "^ id ^ 
, eigh 


MEASURES OF VARIABILITY 59 


of these squared deviations by the frequency which it repre- 
sents — that is, by the frequency opposite it. This multipli- 
cation gives the fx? column. By simple algebra, x X fx = ја; 
and accordingly the easiest way to obtain the entries in column 
Ја? is to multiply the corresponding 275 and fx’s in columns (4) 
and (5). The first fx? entry, for example, is 686.44, the prod- 
uct of 26.20 times 26.20; the second entry is 898.88, the prod- 
uct of 42.40 times 21.20; and so on to the end of the column. 
АП of the fx? are necessarily positive since each negative x is 
matched by a negative fx. The sum of the Ја? column (7978.00) 
divided by М (50) gives the mean of the squared deviations as 
159.56; and the square root of this result is 12.63, the SD. Тһе 
formula for с when data are grouped into a frequency distri- 
bution is: 


с = 4/21 (10) 
(SD or o for data grouped into a frequency distribution) 


Problem 2 of Table 8 furnishes another illustration of the 
calculation of с from grouped data. In column (6), the fx? 
entries have been obtained, as in the previous problem, by 
multiplying each x by its corresponding fx. The sum of the 
Ја? column is 8927.29; and М is 200. Hence, applying formula 
(10) we get 6.68 as the SD. 

The standard deviation is less affected by sampling errors 
(p. 196) than is the Q or the MD and is a more stable measure 
of dispersion. In a normal distribution the SD, when measured 
off above and below the mean, marks the limits of the middle 
68.2695 (roughly the middle two-thirds) of the distribution. 
This is approximately true also of the c in less symmetrical 
distributions. For example, in the first problem in Table 8 
the middle 65% of the scores fall between score 183 (170.80 4- 
12.63) and score 158 (170.80 — 12.63). The SD is always 
larger than the MD which is, in turn, always larger than Q. 

* See page 135 for method of calculating the percentage of scores falling 


between two points in a frequency distribution. 


60 STATISTICS IN PSYCHOLOGY AND EDUCATION 


'These relationships supply а rough check upon the accuracy 
of the measures of variability. 


II. CALCULATION оғ THE SD By THE SHORT METHOD 


1. Calculation of о from Grouped Data | 
On page 41, the Short Method of calculating the mean was 
tlined. This method consisted essentially in “guessing E 
sis i mean, and later applying to this value a correction 
oni actual mean. The Short Method may also be used 
in mid in calculating the SD*. Tt isa decided time and 
labor saver in dealing with grouped data; and is well-nigh 
indispensable in the calculation of o’s in a correlation table 
ш e рді Method of calculating the SD is illustrated in 
Table 9. The computation of the mean is repeated in the table, 

TABLE 9 
Tue CALCULATION оғ THE SD вх THE Зновт Merrnon.} 
Data FROM TABLE 1. CALCULATIONS BY THE 
Lone Метнор Given ron Comparison 


1. Short Method 


(1) (2) (3) (4) (5) б) 
Scores e сш f к " T 
195-199 197 1 5 5 Е 
190-194 192 2 4 8 25 
185-189 - 187 4 3 12 32 
180-184 182 5 2 10 36 
175-179 177 8 1 “8 (+43) 0 
170-174 172 10 0 
165-169 167 6 si -— Р 
160-164 162 4 => ES if 
155-159 157 4 = 8 -19 Ж. 
150-154 159 2 жәй, р 38 
145-149 147 3 =p 236 3 
140-144 142 ko -6 —6(— 55) 15 
М = 50 98 ze 


* The MD may also be calculated by the а 
Method. Тһе MD is so rarely used, however, that teste сап ог Short 
calculation (which is neither very short nor very sati оге Method of 


he calculation of the mean is repeated from Tee) is not given, 


^ 


MEASURES OF VARIABILITY 61 
TABLE 9 (continued) 
1. АМ = 172.00 с-- 36 = —.940 ci = — .240 xX 5 = — 1.20 
с = 0576 
ci = — 1.20 “% 
M - 170.80 " 
Шыға - 2 В22 В 1 
2. SD = a= c X i (interval) = 50 0576 х5 
= 12.63 М 
2. Long Method 
а) (2) (3) (4) (5) (6) (7) 
Scores Миро 7 fX = fe уа 
195-199 197 1 197 26.20 26.20 686.44 
190-194 192 2 зи 21.20 4240 80558 
185-189 187 4 748 16.20 64.80 1049.76 
180-184 182 5 910 11.20 56.00 627.20 
175-179 177 8 1416 6.20 49.60 307.52 
170-174 172 10 1720 1.20 12.00 14.40 
165-169 167 6 1002 — $80 — 2280 86.64 
160-164 162 4 648 — 8.80 \ 309.76 
155-159 157 4 — 698 — — 13:80 761.76 
ие, 152 2 30  — 1880 706.88 
145-149 147 3 441 — 23.80 1699.32 
140-144 142 E: 142 - 28.80 829.44 
М-50 8540 7978.00 
ЖЕ. 800... zog 


as is also the calculation of the mean and SD by the direct or 
Long Method. This procedure affords a readier comparison 


of the two techniques. . 
The formula for computing с by the Short Method is 


c= -e X i (interval) (11) 


(SD from a frequency distribution when deviations are taken 
from an assumed mean) 

in which Уух" is the sum of the squared deviations in units 

of class-interval, taken from the assumed mean, and c? is the 

Squared correction in units of class-interval. 


62 STATISTICS IN PSYCHOLOGY AND EDUCATION 


The calculation of о by the Short Method may be ЕЕ 1 
il from Table 9. Deviations are taken from t e assume 

aay 72 0) in units of class-interval and entered in column (4) 
pes E: lumn (5) each z' is weighted or multiplied by its f 
ind meis and in column (6) the fx’”’s are found by mul- 
Ane еа h al in column (4) by the corresponding fx’ in 
s (5). The process is identical with that used in the 
= Method except that the z^s are all expressed in units of 
class-interval. This considerably simplifies the multiplication. 
The calculation of c has already been described on page 44: 
c is the algebraic sum of column (5) divided by N. The sum 
of the fx’? column is 322, and с? is .0576. Applying formula (11) 

ү t 2.525 x 5 (interval) or 12.63 as the с of the distribution. 
ыы x la (11) for the calculation of с by the Short Method holds 
ге ptis matter what the size of c, the correction in units of 
class-interval, or where the mean has been assumed. 


2. Calculation of o from the Original Measures or Scores 

It will often save time and labor to apply the Short Method 
for computing c directly to the ungrouped scores. Тһе method 
is illustrated in Table 10. Note that the ten scores are un- 
grouped, and that it is not necessary even to arrange them in 
order of size. The assumed mean is taken at Zero, and each 
score becomes at once a deviation (z^ from this AM ; that is, 
each score (X) is unchanged. The correction, c, is the diff erence 
between the actual mean (M) and the assumed mean (0), i.e., 
c = M — 0; hence cis simply M itself. ‘The mean is calculated, 
as before, by summing the scores and dividing by N (see p. 32). 
To find о, we square the 278 (or the X's which are the Scores), 
sum them to get E(x’)? or Z Х?, divide b 
the correction squared. 
À convenient formula is 


2-8 — М 


«uU 
or replacing the М? by (59 ; 


y М, and subtract M?, 
Тһе square root of the result gives с. 


(12) 


MEASURES OF VARIABILITY 63 


vNZX?:— (2X)? 
N 
(с calculated from original scores by the Short Method) 


(13) 


This method of calculating с is especially useful when there 
аге relatively few scores, say fifty or less, and when the scores 
are expressed in поё more than two digits,* so that the squares 
do not become unwieldy. A calculating machine and a table of 
squares will greatly facilitate computation. Simply sum the 
scores as they stand and divide by N to get M. Then enter 
the squares of the scores in the machine in order, sum, and 
substitute the result in formula (12) or formula (18). 


TABLE 10 


To ILLUSTRATE THE CALCULATION or THE SD FROM ORIGINAL 
Scores WHEN THE ÁssUMED MEAN Is TAKEN AT ZERO, 
AND Data Аве UxNGROUPED 


Scores (X) x’ (or X) (2^)? or (X?) 
18 18 324 
25 25 625 
21 21 441 
19 19 361 
27 27 729 
81 31 961 
22 22 484 
25 25 625 
98 28 784 
90 20 400 

236 236 5734 


= 
1 
5 


— (28.6)? X 1 (interval) 


= Мб 
= 4.06 


For the application of this method to the caleulation of coefficients 


of Correlation, and a scheme for reducing the size of the original scores 
don, and г 


80 as to eliminate the need for handling large numbers, see page 293. 


66 TATISTICS IN PSYCHOLOGY AND EDUCATION 


CE 5.0 pounds. In which trait is the group 
of 50 и dd ы ет Since we cannot compare 
— be | unds directly, it is impossible to BDSWEF this 4955 
inches an = e the SD’s of the height and weight distri- 
Bon a ces can compare the relative variability of the 
Loa is in terms of their coefficients of variation. 
two Б i: 


Thus, 100 x 2.5 


fuys = = 5.6 by formula (14) 
Va? 100% 6.0 = 12 by formula (14) 
and wt 50 


from which it appears that these boys are 5.6/12 or 47% as 
variable in height as in weight. Paar М 

“ы t us consider the ease where variability is measured 
3 b vi units, but around different points on the scale, 
% m € of five minutes, a group of fifty children had worked 
= ч of 20.50 examples correctly, the o being 5.24, At, 
the end of ten minutes, the same group had worked an average 
of 34.80 examples correctly, the с being 9.02. If we compared 
the o’s of the two distributions directly, we should probably be 
inclined to conclude that the group was nearly twice as variable 
at the end of the ten-minute period as it was at the end of the 
five-minute period, since the o has increased from 5.24 to 9.62. 
This conclusion is correct as far as the absolute spread or varia- 
bility within the group is concerned. But to compare the 
relative dispersion of the group in the two periods, we must 
take account of the fact that, with the increase іп с, the means 
have also increased from 20.50 to 34.80. "The coefficients of 
variation give the following results: 


T 100 x 5.24 
For the five-minute period: V = — 


20.50 = 256 
Е 100 
For the ten-minute period: V = E a = 27.6 


Thus, instead of being about 50%, 


70 8$ variable in the five-minute 
period as in the ten, the group is 


25.6/27.6 or 93% as variable, 


> 


MEASURES OF VARIABILITY 67 


when the mean score is considered as well as the absolute 
variability. 

Objection has been raised* to the use of V in comparing the 
relative variability of test scores because the “true” zero point 
of ability in mental and educational tests is unknown. This 
objection does not apply, of course, to physical and physiological 
measures since these have true zeros. How the lack of knowl- 
edge of the true zero in a mental test may affect V can be shown 
most readily, perhaps, by an example. Suppose that we have 
given a vocabulary test to a group of children, and have ob- 
tained a mean of 25 and a о of 5. V will equal 20. Now sup- 
pose that we add 30 very easy items, say, to our vocabulary 
test. It is highly probable that every child will know all of 
the added words, and hence the mean score as well as every 
subject’s score will be increased by 30. The absolute varia- 
bility of the group (the 9) will, however, remain unchanged, as 
each subject occupies exactly the same relative position as be- 


` fore. Ап increase in the mean (from 25 to 55) without a corre- 


sponding increase in & changes V from 20 to 9; апа, since we 
could add 40 or 400 items as easily as 30, V appears to be a very 
unstable measure. 

While theoretically correct, criticism of V because of the 
arbitrary nature of the zero point in mental and educational 
tests is not so generally destructive as it seems. Makers of 
standard psychological tests have been careful to begin their 
tests with items which, by experimental tryout, have been 
found to have minimal difficulty for the group for whom the 
test is designed. While admittedly arbitrary, such “лего” 
points are at least located at extremely low levels of difficulty 
in the ability measured by the test; hence it would be foolish 
to include additional easy items at the low end of the scale. 
The mean tells us how far the group has progressed, on the 
average, from the arbitrary zero point of the test, y shows, 

* Franzen, R., “Statistical Issues," Journal of Educational Psychology, 


15 £ а | 
ie aaa “The Absolute Zero in Intelligence Measurement,” 


Psychological’ Review, 35 (1928), 175-197. 


68 STATISTICS IN PSYCHOLOGY AND EDUCATION 


essentially, what percentage the variability is of this distance. 
Like M, V has a definite meaning for the test as it stands. 
If the range of difficulty in the test is altered, or the units 
changed, not only V, but M, is changed. V, therefore, is in 
а sense no more arbitrary than M, and the objections raised 
against this measure can be directed with equal force against ЛГ. 

V is most useful, perhaps, in comparing the variability of a 
group upon the same test administered under different con- 
ditions, as, for example, when a group of students works at a 
task with and without distraction. The zero point here, at 
least, remains substantially constant. V may also be used to 
compare two or more groups on the same test, as when ten- 
year-old boys and ten-year-old girls are compared in tests of 
logical memory or picture completion. In both of these cases 
it is probably justifiable to assume that the “true” zero point 
of ability is sensibly the same for the groups compared. 

It is, perhaps, most difficult to interpret V when the varia- 
bility of а group upon different mental tests is а matter of 
interest. If we compare a group of girls for variability in para- 
graph reading and in arithmetic computaton, it should be 
made plain that the V's refer only to the specific scales upon 
which performance has been measured. Other tests of reading 
and arithmetic may — and probably will — give different re- 
sults because of difference in test units, range of difficulty 
covered by the test, and position of arbitrary zero points. But 
if one restricts his use of V to the particular measures which 
he has employed, this coefficient will furnish useful information, 


IV. Tur SHORT METHOD APPLIED TO DISCRETE SERIES 


We have defined a truly discrete series on page 2 as one in 
which there are real gaps. This means that in a discrete Series 
each measure, instead of representing an interval on a scale ph 
in à continuous series, is a separate and distinct value, "There 
is, for example, а real gap between one man and two men; 
or between one dollar and two dollars, provided the unit of 
measurement in the latter case is one dollar. 


MEASURES OF VARIABILITY 69 


Table 11 illustrates the method of calculating the measures 
of central tendency and variability for discrete measures tabu- 
lated into a frequency distribution. Тһе data consist of the 
records of the number of children in forty-four families in а 
rural community. In the first column of the table is given 
the number of children in the family; in the second column — 
under f — the number of families of a given size. We find, for 
instance, one family of ten children; three of nine; four of 
eight, ete. Since the measures — here, the children — are dis- 
crete, each measure must be taken at face value, and there are, 
in consequence, no midpoint values for the different steps. 


TABLE 11 


To ILLUSTRATE THE CALCULATION or THE MEAN, THE MEDIAN, 
Q anD SD WHEN Measures ARE DISCRETE 


(Note that the f column gives the number of families containing the 
children listed in the first column) 


Number of Families А Р j 
Children F; T fx Jan 
10 1 5 5 25 
9 3 4 12 48 
8 4 3 12 36 
T 3 2 6 12 
6 5 1; 5 (+ 40) 5 
5 8 0 
4 7 = 1 — T 
3 4 —2 -8 16 
2 4 —3 — 12 36 
1 2 —4 = 32 
0 3 -5 — 18 (— 50) 75 
№ = 21 90 292 
AM-50 ес= 210= – 238 = 058 
сі = — .28 
М = 4.77 N/2 = 22; and, since the 22nd measure falls on 5, the 
Мап = 5 


Mdn = 5 N/4 = 11; and, since the 11th measure falls on 8, бі -3 

Mode = 5 ЗМ/4 - 33; and, since the 33rd measure falls between 6 and 
‚ Qs = б.б 

_ 65—83 _ 

Qe -175 


SD = V282 — 053 x 1 (interval) = 2.57 


70 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Тһе mean is guessed at 5, and т? are taken directly from this 
point. Тһе Үк and the fx’ columns are calculated exactly as 
shown in Table 9 for a continuous series — the first column is 
obtained by multiplying the corresponding f and z' values, and 
the second by multiplying corresponding x’ and fx’ values. 
Since the class-interval is 1, the correction c equals сё directly. 
If we apply the correction —.23 to 5.00 (the "guessed" 
mean), 4.77, the mean of the distribution, is obtained. This re- 
sult, while mathematically correct, is rather difficult to in- 
terpret in a practical way, as it is obviously impossible for a 
family to have four and a fraction of children. Is thé median 
a more meaningful measure? One-half of the measures is 22, 
and counting in from the small end of the series we find that 
the twenty-second score falls on interval 5. Fractional values 
are, of course, really meaningless in a discrete series; and hence 
we simply take 5 as being roughly the median of the distribu- 
tion without any interpolation. The median family, accordingly 
(and the modal family as well), may be said to contain five 
children, and this result on the face of it is of greater utility 
than the statement that the average number of children in a, 
family is 4.77. 
In computing measures of variability in a discrete series, 
the Q is the only one which offers difficulties. In the present 
illustration, one-fourth (У/4) of the measures is 11, and, count- 
ing in from the low end of the series eleven scores, we put Q, 
on 3 (as in the case of the median, no interpolation is made), 
If we check this value of 0, by counting in thirty-three scores 
from the high end of the distribution, we again obtain 3 as the 
value of Qı. Three-fourths (3N/4) of the measures is 33; and, 
counting in thirty-three scores from the low end, we complete — 
or count through — the frequency on 6. If eleven scores are 
counted off from the other direction, we complete — or count 
through — the frequency on 7. This puts Q; at either 6 or %, 
апа the best way out of the difficulty is to take Оз as roughly 
equal to 6.5, i.e., midway between 6 and 7. Taking Q, equal 


Q 6.5-3 
to 3, and Q; equal to 6.5, Q is $ z 9 1.75. 


MEASURES OF VARIABILITY 71 


The c in a discrete series is found from formula (11) in exactly 


the same way as in a continuous series. In Table 11, the 


is УЧЕ — .053 X 1 (the class-interval) or 2.57. 


WHEN то Use тнк Various MEASURES OF VARIABILITY 


. Use the range 


(1) When the data are too scant or too scattered to justify the cal- 
culation of any other measure of variability. 

(2) When a knowledge of the total spread of scores is all that is 
wanted. 


. Use the Q 


(1) For a quick, inspectional measure of variability. 
(2) When there are scattered or extreme measures. 
(3) When the degree of concentration around the median is sought. 


‚ Use the MD 


(1) When it is desired to weight all deviations according to their 
size. 

(2) When extreme deviations should influence the measure of vari- 
ability, but not influence it unduly. 


. Use the SD 


(1) When the measure having the highest degree of reliability is 


sought (р. 196). 

(2) When it is desired that extreme deviations have а proportionally 
greater influence upon the measure of variability. M" 

(3) When coefficients of correlation or measures of reliability are 


subsequently to be computed (p. 282). 


PROBLEMS 
Caleulate the Q and с for cach of the four frequency distributions 
given on page 46 under problem 1, Chapter IT. 
Calculate the с of the twenty-five ungrouped scores given on page 
28, problem 5(a), taking the АМ at zero. Compare your result 


with the o’s calculated from the frequency distributions of the same 
scores which you tabulated in class-intervals of three and five units. 


72 STATISTICS IN PSYCHOLOGY AND EDUCATION 


3. For the following list of test scores, 
52, 50, 56, 68, 65, 62, 57, 70 
(а) Find the М and с by method on page 60. 
(b) Add 6 to each score and recalculate M and c. 
(c) Subtract 50 from each score, and caleulate M and c. 
(d) Multiply each score by 5 and compute M and c. 
4. Calculate coefficients of variation for the following traits: 
Unit of 


Trait measurement Group M € 
Length of 
Head mms. 802 males 190.52 5,90 
Body Weight pounds 868,445 males 141.54 17.82 
Tapping M of 5 trials 68 adults, 196.91 26.83 
Speed 30” each male and female 
Memory No. repeated 263 males 6.60 1.13 
Span correctly 
General In- Points 1101 adults 153.8 23.6 
telligence scored 


(Otis Group 
Intell. Scale) 
Rank these traits in order for relative variability. Judged by their 


V's which trait is the most variable? which the least variable? 
which traits have true zeros? 


я 


(а) Why is the Q the best measure of variability when there are 
scattered or extreme scores? 


(b) Why does ће с weight extreme deviations more than does the 


MD? 
ANSWERS 
І. (1) Q = 3.38 (2) 0 = 8.13 
в = 4.99 в = 11.33 
(3) Q — 4.50 (4) Q — 16.41 
с = 7.23 с = 24.13 


2. в of ungrouped scores = 6.72 
с of scores grouped іп 3-unit intervals = 6.71 
с of scores grouped in 5-unit intervals = 6.78 


MEASURES OF VARIABILITY 73 
3. (а) М = 60 (5) М = 66 (c) М = 10 (d) М = 300 
c = 6.91 с = 6.91 с = 6.91 с = 34.55 
4. V's in order are 3.10; 12.59; 13.63; 17.12; 15.39. Ranked for 
relative variability from most to least: Memory Span; General 
Intelligence; Tapping Speed; Weight; Head Length. Last two 
traits have true zeros. 


CHAPTER IV 


CUMULATIVE DISTRIBUTIONS, GRAPHIC 
METHODS, AND PERCENTILES 


Ix Chapter I, we learned how to represent the frequency dis- 
tribution by means of the polygon and the histogram. In the 
present chapter, other descriptive methods will be considered — 
the cumulative frequency graph, the cumulative percentage curve 
or ogive, and certain simple graphical devices. Also, methods 
will be given for caleulating percentiles and percentile ranks: 
from frequency distributions and directly from graphs. 


I. Tur CUMULATIVE FREQUENCY QRAPH 


1. Construction of the Cumulative Frequency Graph 

The cumulative frequency graph is another way of repre- 
senting a frequency distribution by means of a diagram. Bef ore 
we can plot a cumulative frequency graph, the scores of the 
distribution must be added serially or cumulated, as shown in 
Table 12, for the two distributions taken from Table 5, page 35. 
These two sets of scores have already been used to illustrate the. 
frequency polygon and histogram in Figures 2, 4, and 5. Тһе 


first two columns for each of the distributions in Table 12: 


repeat Table 5, page 35, exactly; but in the third column 
(Cum. f) scores have been "accumulated" progressively from 
the bottoin of the distribution upward. To illustrate, in the 
distribution of Army Alpha scores the first, “cumulative fre- 
quency” is 1; 1+3, from the low end of the distribution, 
gives 4 as the next entry; 4+2=6; 6--4 = 10, etc. The 
last cumulative frequency is, of course, equal to 50 or N, the 
total frequency. 


'The two cumulative frequency graphs which represent the 
74 


GRAPHIC METHODS AND PERCENTILES 7. 


Or 


TABLE 12 


CUMULATIVE FREQUENCIES FOR THE Two DISTRIBUTIONS 
GIvEN IN TABLE 5, Р. 35 


Army Alpha + Cancellation 
Stores f Cum. f Scores Я Cum. f 

195-199 1 50 135.5 to 139.5 3 200 
190-194 2 49 5 5 197 
185-189 4 47 192 
180-184 5 43 176 
TÉ 8 88 153 
10 80 101 
6 20 52 
4 14 25 
4 10 М. 

2 6 

3 4 

1 1 

N = 50 


distributions of Table 12 are shown in Figures 8 and 9. Con- 
sider first the graph of the fifty Army Alpha scores in Figure 8. 
The class-intervals of the distribution have been laid off along 
the X-axis. There are twelve intervals, and by the “15% 
rule" given on page 13 there should be about nine unit dis- 
tances (each equal to one class-interval) laid off on the Y-axis. 
Since the largest cumulative frequency is 50, each of these Y- 
units should represent 50/9 or 6 scores (approximately). In- 
stead of dividing up the total Y-distance into nine units each 
representing six scores, however, we have, for convenience іп 
plotting, divided the total Y-distance into ten units of five 
Scores each. This does not change significantly the 3:4 rela- 
tionship of height to width in the figure. 

When plotting the frequency polygon the frequency on each 
interval is taken at the midpoint of the class-interval. But in 
constructing a cumulative frequency curve each cumulative 
frequency is plotted at the upper limit of the interval upon 
which it falls. This is because we are adding progressively 
from bottom up and hence each cumulative frequency carries 
through to the upper limit of the interval. The first point on 
the curve is one Y-unit (the cumulative frequency on 140- 
144) just above 144.5; the second point is four Y-units just 


76 STATISTICS IN PSYCHOLOGY AND EDUCATION 


50 ` QUT 


Cumulative Frequencies 


195 MAS 149.5 154.5 159.5 164.5 169.5 1745 179.5 184,5 1855 194,5 199.5 
Зсогез 
Гіс. 8. Cumulative Frequency Graph. (Data from Table 12, p. 75.) 


above 149.5; the tbird, six Y-units just above 154.5, and so 
on to the last point which is fifty Y-units above 199.5. Тһе 
plotted points are joined to give the S-shaped cumulative fre- 
quency graph. In order to have the curve begin on the X- 
axis it is started at 139.5 (upper limit of 134.5 to 139.5), the 
cumulative frequency of which is 0. 

Тһе eumulative frequency eurve in F igure 9 has been plotted 
from the second distribution in Table 12 by the method just 
described. The curve begins at 103.5, the lower limit of the 
first class-interval,* and ends at 139.5, the upper limit of the 
last interval; and cumulative frequencies, 7, 25, 52, etc., are 
all plotted at the upper limits of their respective class-intervals. 
The height of this graph was determined by the “75% rule” 
as in the case of the curve in Figure 8. There are nine class- 
intervals laid off on the X-axis; hence, since 75% of 9 is 7 


* Or the upper limit of the interval just below, i.e., 99.5 to 103.5. 


“1 
N 


GRAPHIC METHODS AND PERCENTILES 


210 


Cumulative Frequencies 
н m = 

" ч 
S e 8 2 s 


eo 
> 


і 103.5 107.5 111.5 115.5 119.5 123.5 127.5 131.5 135.5 139.5 
Scores 


Гіс. 9. Cumulative Frequency Graph. (Data from Table 12, p. 75.) 


(approximately), the height of the figure should be about 
seven class-interval units. To determine the score value of 
cach Y-unit divide 200 (the largest cumulative frequency) by 
7 to give 30 (approximately). Each of the seven Y-units has 
been taken to represent 30 scores. 


II. PERCENTILES AND PERCENTILE RANKS 


1. Calculation of Percentiles in a Frequency Distribution 

We have learned (p. 36) that the median is that point in a 
frequency distribution below which lie 50% of the measures or 
scores; and that Q, and Q; mark points in the distribution below 
which lie, respectively, 25% and 75% of the measures or scores, 
In exactly the same way іп which the median and quartiles are 
found, we may compute points below which lie 10%, 43%, 
85%, or any “precent” of the scores. These points are called 
percentiles, and are designated, in general, by the symbol B 
the p referring to the percentage of cases below the given value. 
Рі, for example, is the point below which lie 1095 of the 
scores; Р, the point below which lie 78% of the scores. It 


78 STATISTICS IN PSYCHOLOGY AND EDUCATION 


is evident that the median, expressed as a percentile, is Ps; 
also О, is Р, and Q; is P5. 

'The method of calculating percentiles is essentially the 
same as that employed in finding the median. The formula is 


Р,=1+ (= 2 X i (interval) (15) 
p 
(percentiles in а frequency distribution, counting from below up) 
where 
p = percentage of the distribution wanted, e.g., 10%, 33%, 
ete. 


1 = lower limit of the class-interval upon which Р, lies 
pN = part of М to be counted off in order to reach P, 
F = sum of all scores upon intervals below 1 
Jp = number of scores within the interval upon which P, falls 
i = length of the class-interval 


In Table 13, the percentile points, Py to Ри» have been com- 
puted by formula (15) for the distribution of scores made by 
the fifty college students upon Army Alpha, shown in Table 1 
page 6. Тһе details of caleulation are given in Table 13. 
We may illustrate the method with Pss. Here, pN = 35 

70%, of 50 = 35), and from the Cum. f we find that 30 Scores 
take us through 170-174 up to 174.5, the lower limit of the in- 
terval next above. Hence, Pr falls upon 175-179, and, sub- 
stituting pN = 35, F = 30, f, = 8 (frequency upon 175-179), 
апа = 5 (elass-interval) in formula (15), we find that 
P, = 177.6 (for detailed calculation, see Table 13). This re- 
sult means that 70% of the fifty students scored below 177.6 
in the distribution of Army Alpha scores. The other per- 
centile values are found in exaetly the same way as Ри. The 
reader should verify the calculations of the P, in Table 13 in 
order to become thoroughly familiar with the method. 

Tt should be noted that Po, which marks the lower limit of 
the first interval (namely, 139.5) lies at the beginning of the 
distribution. Pio marks the upper limit of the last interval, 


GRAPHIC METHODS AND PERCENTILES 79 


Scores 


195-199 
190-194 
185-189 
180-184 
175-179 
170-174 
165-169 
160-164 
155-159 
150-154 
145-149 
140-144 


10% of 50 = 
20% of 50 = 
30% of 50 = 
40% of 50 = 


50% of 50 = 


Ш 


60% of 50 


7095 of 50 — 


80% of 50 =. 


90% of 50 = 


and lies at. the 


М = 


5 


10 


45 


ом e оос жы 


11 
о 


TABLE 13 


CALCULATION OF CERTAIN PERCENTILES IN A FREQUENCY 
DISTRIBUTION 


(Data are fifty Army Alpha scores, see Table 1, p. 6) 
Cum. f 


184.5 + ( 


Percentiles 
Pio = 199.5 


= 152.0 


159.5 


Il 


= 165.3 


e 
| 


169.5 


© 
Ш 


5 = 172.0 (Mdn) 


5 = 1$7.0 


end of the distribution. These two percentiles 
represent limiting points. Their principal value is to indicate 
the boundaries of the percentile scale. 


80 STATISTICS IN PSYCHOLOGY AND EDUCATION 


2. Calculation of Percentile Ranks in a Frequency: Distibution 
` We have seen in the last section how percentiles, E 15 
or Ре, шау be calculated directly from а, frequency ^ ipid 
i i То repeat what has been said above, a es эге 
ы: ; in а continuous distribution below which lie given per- 
din i: of N. We shall now consider the problem of finding 
eri tale percentile rank (PR); or the position оп а scale 
of 100 to which the subject’s score entitles him. The distine- 
tion between percentile and percentile rank will be clear if the 
reader remembers that in calculating percentiles he starts 
with a certain percent of N, say 15% or 62%. Не then counts 
into the distribution the given percent and the point reached 
is the required percentile, e.g., Pis or Ре. Тһе procedure fol- 
jowed in computing percentile ranks is the reverse of this 
process. Here we begin with an individual score, and determine 
the percentage of scores which lies below it. If this percentage 
is 62, say, the score has a percentile rank or PR of 62 on ascale 
of 100. 

We may illustrate with Table 13. What is the PR of a man 
who scores 163? Score 163 falls on interval 160-164. There are 
ten scores up to 159.5, lower limit of this interval (see column 
Cum. f), and four scores spread over this interval. Dividing 
4 by 5 (interval length) gives us .8 score per unit of interval. 


Тһе score of 163, which we are seeking, is 3.5 score units from 
159.5, lower limit of the interval within which the score of 163 


lies. Multiplying 3.5 by .8 we get 2.8 as the score-distance of 
163 from 159.5; and adding 2.8 to 10 (number of scores below 
159.5) we get 12.8 as the part of N lying below 163. Dividing 
12.8 by 50 gives us 25.6% as that proportion of N below 163; 


hence the percentile rank of score 163 is 26. The diagram be- 
low will clarify the calculation: 


f-4 


8 
—^ —— 
| 8 | i8 8 [4 | 4) а | 
160.5 161.5 162.5 163.5 
158.5 163.0 


GRAPHIC METHODS AND PERCENTILES 81 


Ten scores lie below 159.5. Prorating the four scores on 160- 
164 over the interval of 5, we have .8 score per unit of interval. 
Score 163 is just .8 + .8 + .8 + .4 or 2.8 scores from 159.5; or 
score 168 lies 12.8 scores or 25.6% (12.8/50) into the distribution. 

Тһе РА of any score may be found in the same way. For 
example, the percentile rank of 181 is 79 (verify it). The reader 
Should note that a score of 163 is taken as 163.0, midpoint of 
the score-interval 162.5 to 163.5. This means simply that the 
midpoint is assumed to be the most representative value in a 
Score-interval. "Тһе percentile ranks for several scores may be 
read directly from Table 13. For instance, 152 has a PR of 10; 
172 (median) a PR of 50, and 187 a PR of 90. If we take the 
percentile-points as representing approximately the score- 
intervals upon which they lie, the PR of 160 (upon which 
159.5 lies) is approximately 20 (see Table 13); the PZ of 165 
(upon which 165.3 lies) is approximately 30; the PR of 170 
18 approximately 40; of 175, 60; of 178, 70; of 182, 80. "These 
PR’s are not strictly accurate, to be sure, but the error is slight. 


3. Calculation of Percentile Ranks When Individuals or Ob- 
jects Are in Order of Merit 

Percentile ranks are often used in experimental psychology 
when we are dealing with attributes for which individuals or 
objects may be arranged in order of merit, but in which they 
cannot be measured directly. Children, for instance, may be 
arranged in order of merit for inventiveness or for social ad- 
justment, pietures and musical selections may be ranked for 
aesthetic qualities, compositions and handwriting specimens 
for excellence. When translated over into PR’s these ranks may 
be treated as scores (p. 174). 

We may illustrate the use of PR’s in such situations for the 
Simple case of twenty-five officer candidates ranked 1, 2, 8, 
~... 25 in order of merit for “leadership qualities.” Here the 
highest-ranking man has a percentile rank of 98; and the lowest- 
ranking man a percentile rank of 2. How these values are 
caleulated may be shown in the following way: On a scale run- 


82 STATISTICS IN PSYCHOLOGY AND EDUCATION 


ning from 0 to 100, each of twenty-five individuals occupies 
four divisions (100/25 or 4%) of the scale. Hence, we assign 
to the poorest individual the midpoint of the first four divisions 
on the scale (0—4) or 2; to the next poorest, the midpoint of the 
next four divisions (4-8) or 6; and to the best person, the mid- 
point of the four highest divisions (96-100) or 98. Diagrams 
ilustrating the method of assigning percentile ranks to the 
best and poorest persons in a group of twenty-five will make 
the procedure clearer: 


Lowest-Ranking Individual 


0 7 2 3 4 
Highest-Ranking Individual 
96 97 98 99 100 


If 100 people are arranged in order of merit, what is the 
percentile rank of the lowest-ranking person? The answer is 
clear. Since there are just 100 subjects, each occupies one 
division (100/100 or 1%) on the percentage scale. Hence the 
rank of the poorest subject is .5 (midpoint of the interval 0-1) 
and of ihe best subject 99.5 (midpoint of the interval 99-100) 
These РА’з and those of the last example may be readily fou d 
by means of the following formula* which converts алдада Е 
merit into equivalent percentile ranks: қ 

_ _ (100R — 50) 
PR = 100 —— cm (16) 
(percentile ranks for individuals ranked in order of merit) 

The R in the formula is the rank position of the individual 
counting #1 as the highest rank in the group. Thus, the indi- 
vidual who ranks highest, i.e., #1, in a group of twenty-five has 
à PR of 100— 005€ 1—50) од, indivi 

25 ; and the individual who 


* For a table giving percentile ranks f. i 
and ranging from 11 to 100 in number, en. Buros, E (ed іп order of merit, 
Expressing Educational Measures as Percentile Ranks, Test M ee Oe 
#3, (Yonkers, N.Y.: World Book Co., 1930). In this tube ео Helps, 
taken to be the highest, of 2 the next highest, etc. e a rank of 1 is 


5 
й 


GRAPHIC METHODS AND PERCENTILES 83 


ranks fifth (1.е., five from the top, twenty from the bottom) has 
(100 x 5 — 50) 


a PR of 100 — use E 82. The person who ranks 
fiftieth in a group of 100 has а PR of 100 — поо x 30 - 50) 


or 50.5, the middle of interval 50-51 on the percent seale. Since 
а person's percentile rank is always the midpoint of an interval 
on the scale which runs from 0 to 100, it is evident that no one 
can have a percentile rank of 0 or of 100. "These two points 
constitute the limits of the percentile scale. 


ІШ. Tas CUMULATIVE PERCENTAGE CurvE ов OGIvE 


1. Construction of the Ogive 

The cumulative percentage curve or ogive differs from the 
cumulative frequency graph in that frequencies are expressed. 
as cumulative percents of N on the Y-avis instead of as cumu- 
lative scores. "Table 14 shows how cumulative frequencies are 
expressed as percentages of №. The distribution consists of 


TABLE 14 


CALCULATION or CuwvLATIVE PERCENTAGES TO UPPER LIMITS ОЕ 
CLass-INTERVALS IN A Frequency DISTRIBUTION 
(The data represent scores on а reading test achieved 
by 125 seventh-grade children) 


(1) (2) (3) (4) 
Scores f Cum. f Cum. Percent f 
74.5 to 79.5 1 125 100.0 
69.5 to 74.5 3 124 99.2 
64.5 to 69.5 6 121 96.8 
59.5 to 64.5 12 115 92.0 
54.5 to 59.5 20 103 824 
49.5 to 54.5 36 83 66.4 
44.5 to 49.5 20 47 37.6 
39.5 to 44.5 15 27 21.6 
34.5 to 39.5 6 12 9.6 
29.5 to 34.5 4 6 4.8 
24.5 to 29.5 2 2 1.6 


84 STATISTICS IN PSYCHOLOGY AND EDUCATION 


i st by 125 seventh-grade pupils. 
c с а) peep reet bed and frequencies are 
m ns (om in column (3) the f's have been cumulated from the 
ondes of the distribution upward as described before on 
E 74. These Cum. f's are expressed as percentages of № 
(125) i5 column (4). 'The conversion of Cum. Тв into cumu- 
lative percents can be carried out by dividing each cumulative 
f by N; eg., 2- 125 = .016, 6 + 125 048, and 80 on. A 
better method — especially when a calculating machine is 
available — is to determine first the reciprocal, 1/N ] called the 
Rate, and multiply each cumulative f in order by this fraction, 
As shown in Table 14, the Rate is 1/125 or .008. Hence, multi- 
plying 2 by .008, we get .016 or 1.6%; 6 x .008 = .048 or 4.8%; 
12 x .008 = .096 or 9.6%, etc. 

The curve in Figure 10 represents an ogive plotted from the 
data in column (4), Table 14. Class-intervals have been laid 
off on the X-azis, and a scale consisting of ten equal distances, 
each representing 10975 of the distribution, has been marked 
off on the Y-axis. The first point on the ogive is placed 1.6 Ү- 
units just above 29.5; the second point is 4.8 Y-units just above 
34.5, etc. The last point is 100 Y-units above 79.5, upper limit 
of the highest class-interval. 


2. Computing Percentiles and Percentile 
Cumulative Percentage Distribution and from (b) the Ogive 

(a) Percentiles may be readily determined by direct inter- 
polation in column (4), Table 14. We may illustrate by cal- 


culating the 7156 percentile. Direct interpolation between the 
percentages in column (4) gives the follow 


Ranks from (a) the 


ring: 
66.4% of the distribution up to 54.5 
71.0% ----- НЕЕ ene — 55.9 
(given) 82.4% of the distribution up to 59.5 
16.095 5.0 


The 71st percentile lies 4.6% above 66.4%. By Simple pro- 
rtion б g = 46 
po ‚160-592 = т 


16.9 X 9 = 14 (wis the distance of the 


GRAPHIC METHODS AND PERCENTILES 85 


Ps or 2, 


= 
© 


Тез = 54 (approximately) 


e 
e. 


ET or Mdn 


л 
© 


ы 
o 


Cumulative Percentages 
s 


24.5 29.5 34.5 39.5 44.5 49.5 545 59.5 64.5 69.5 74.5 79. 
Scores 


Fro. 10. Cumulative Percentage Curve or Ogive Plotted 
from the Data of Table 14. 


71st percentile from 54.5). The 71st percentile, therefore, is 
54.5 + 1.4, or 55.9. 

Certain percentiles can be read directly from column (4). 
We know, for instance, that the 5th percentile is approximately 
34.5; that the 22nd percentile is approximately 44.5; that the 
38th percentile is approximately 49.5; and that the 92nd per- 
centile is exactly 64.5. Another way of expressing the same 
facts is to say that 21.6% of the seventh graders scored below 
44.5, that 92% scored below 64.5, ete. 

_ Percentile ranks may also be determined from Table 14 by 
Interpolation. Suppose, for example, we wish to calculate the 

Е of score 43. From column (4) we find that 9.6% of the 

Scores are below 39.5. Score 43 is 3.5 (43.0 — 39.5) from this 


86 STATISTICS IN PSYCHOLOGY AND EDUCATION 


point. "Тһеге aré five score-units on the interval 39.5 to 44.5 
which eorrespond to 12.0% (21.6 — 9.6) of the distribution; 
hende, 3.5/5 X 12.0 or 8.4 is the percentage distance of score 
43 from 39.5. Since 9.6% (up to 39.5) + 8.4% (from 39.5 to 
43.0) comprise 18% of the distribution, this percentage of № 
lies below score 43. Hence, the PR of 43 is 18. бес detailed 
calculation below. 


9.6% of distribution up to 39.5 


7----score 43.0 
(given) 


21.6% of distribution up to 4 

12.0% 5.0 
Score 43.0 is 3.5/5 X 12.0% or 8.4% from 39.5; hence score 
43.0 is 9.6% + 8.4% or 18.0% into the distribution. 

It should be noted that the cumulative percents in column (4) 
give the PR’s of the upper limits of the class-intervals in which 
the scores have been tabulated. The PR of 74.5, for example, 
is 99.2; of 64.5, 92.0; of 44.5, 21.6, ete. These PR’s are the 
ranks of given points in the distribution, and are not the PR’s 
of scores. 

(b) Percentiles and percentile ranks may also be determined 
quickly and fairly accurately from the ogive of the frequeney 
distribution plotted in Figure 10. To obtain Ps, the median, 
for example, draw a line from 50 on the Y-scale parallel to the 
X-axis and where this line cuts the curve drop a perpendicular 
to the X-axis. This operation will locate the median at 51.5, 
approximately. The exact median, calculated from Table 14, 
is 51.65. Q, and Q; are found in the same way as the median. 
Р» or Q, falls approximately at 45.0 on the X-axis, and Ps; or 
Qs falls at 57.0. These values may be compared with the 
calculated Q, and Q; which are 45.56 and 57.19, respectively. 
( entiles are read in the same way. To find Py, for 
Instance, begin with 62 on the Y-azis, £o horizonally over to 
"i A and drop a perpendicular to locate Ре» approximately 
at 54. 


In order to read the percentile rank of а given score from 
the ogive, we reverse the process followed in determining per- 


GRAPHIC METHODS AND PERCENTILES 87 


centiles. Score 71, for example, has a PR of 97, approximately 
(see Figure 10). Calculation consists in stenting with score 71 
on the X-axis, going vertically up to the ogive, and horizontally 
across to the Y-axis to locate the PR at 97 on the cumulative 
percentage scale. The PR of score 47 is found in the same way 
to be approximately 30. 

It will be noted that percentiles and percentile ranks are 
usually slightly in error when read from an ogive. If the curve 
is carefully drawn, however, the diagram fairly large and the 
scale divisions precisely marked, percentiles and PR’s may be 
read to a degree of accuracy sufficient for most purposes. 


3. Other Uses of the Ogive 
(1) Comparison of Groups 

A useful over-all comparison of two or more groups is pro- 
vided when ogives representing their scores on a given test 
are plotted upon the same coördinate axes. An illustration is 
given in Figure 11 which shows the ogives of the scores earned 
by two groups of children — 200 ten-year-old boys and 200 
ten-year-old girls — upon an arithmetic reasoning test of sixty 
items. Data from whieh these ogives were constructed are 
given in Table 15. 

Several interesting observations сап be made from Figure 11. 
Тһе boys' ogive lies to the right of the girls' over the entire 
range, showing that the boys score consistently higher than the 
girls. Differences in achievement as between the two groups 
are shown by the distances separating the two curves at various 
levels. It is clear that differences at the extremes — between 
the very high-scoring and the very low-scoring boys and girls 
— are not so great as are differences over the middle range. A 
more detailed analysis of the achievement of these two groups 
Comes out in a comparison of certain points in the distribution. 
The boys’ median is approximately 42, the girls’ 32; and the 
difference between these measures is represented in Figure 11 
by the line AB. The difference between the boys’ Q; and the 
girls’ Q, is represented by the line CD; and the difference be- 


88 STATISTICS IN PSYCHOLOGY AND EDUCATION 


tween the two 0:75 is shown by the line EF. It is clear that the 
groups differ more at the median than at either quartile, and 
are farther separated at Q; than at О. 


TABLE 15 


FREQUENCY DISTRIBUTIONS OF THE Scores Маре BY 200 
Tren-YEAR-OLD Boys anp 200 Тех-Ү EAR-OLD GIRLS 
ON AN ARITHMETIC REASONING TEST 


Binogthed бї e Smoothed 

у Сит. um. rirls um. Cum. 
боз EPS Cums QY percen Оу" Cung Gam. Cum.” 

age f age f 
60-64 0 200 100.0 100.0 0 200 100.0 100.0 
200 100.0 99.7 1 200 100.0 99.8 
80-54 ой 200 1000 95.2 0 199 995 997 
45-49 48 173 86.5 82.7 9 199 99.5 98.0 
40-44 47 125 62.5 62.7 27 190 95.0 92.0 
35-39 19 78 39.0 43.7 44 163 81.5 78.7 
30-34 26 59 29.5 28.3 43 119 595 507 
25-29 15 83 16.5 183 40 76 380 38.5 
20-24 9 18 9.0 10.0 10 36 — 180 23.0 
15-19 Т. 9 4.5 4.8 20 26 13.0 .0 
10-14 2 2 1.0 18 1 6 3:0 62 
5-9 0 0 0 3 2 5 2.5 23 
0-4 0 0 0 0 3 3 1.5 13 
200 200 0 

1 
Rate = 300 = .005 


The extent to which one distribution Overlaps another, 


nated points, can be 

By extending the 
up to the ogive of 
the girls’ scores, it is clear that approximately 88% of the girls 
Hence, approximately 12% of 
boys in arithmetic reasoning. 
о girls, we find that approxi- 
Ве girls’ median. The vertical 
give at approxi- 
| percentile. Therefore 24% of the boys fall 
below the girls’ median, and 76% are above this point. Still an- 
other illustration may be helpful. Suppose the problem is to de- 
termine what percentage of the girls score at or above the boys’ 
60th percentile. The answer is found by locating first the point 


GRAPHIC METHODS AND PERCENTILES 89 


Cumulative Percents 


45 95 Ts 195 245 295 345 395 44.5 49.5 545 595 


Scores 


Fic. 11. Оріуеѕ Representing Scores Made by 200 Boys and 200 Girls 
on an Arithmetic Reasoning Test. (See Table 15.) 


where the horizontal line through 60 cuts the boys’ ogive. We 
then find the point on the girls’ ogive directly above this v alue, 
and from here proceed horizontally across to locate the per- 
centile rank of this point at 93. Since 93% of the girls fall 
below the boys’ 60th percentile, about 7% score above this 
point. 


(2) Percentile Norms 

Norms are measures of achievement which represent the 
typical performance of a designated group or groups. The 
norm for ten-year-old boys in height, and the norm for seventh- 
grade pupils in City X in arithmetic is usually the mean or the 
Median for the group. But norms may be much more detailed 
and may be reported for other points in the distribution as, 
for example, Qi, Qs, and various percentiles. 


90 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Percentile norms are especially useful in dealing with educa- 
tional achievement examinations, when one wishes to evaluate 
and compare the achievement of a given student in a number 
of subject-matter tests. И the student earns a score of 63 on 
an achievement test in arithmetie, and a score of 143 on an 
achievement test in English, we have no way of knowing fron 
the scores alone whether his achievement is good, medium, or 
poor, or how his standing in arithmetic and in English com- 
pare. If, however, we know that a score of 63 in arithmetic 
Ваза РЁ? of 52, and a score of 143 in English а PR of 68, we may 
say at once that this student is average in arithmetic (52% of 
the students score lower than he) and good in English (68% 

СОТ" Г n). 

ie may be determined directly from the 
smoothed ogives of score distributions. Figure 12 represents 
the smoothed ogives of the two distributions of scores in arith- 
metie reasoning given in Table 15. Vertical lines drawn to the 
base line from points on the ogive locate the various percentile 
points. In Table 16 below, selected percentile norms in the 
arithmetic reasoning test have been tabulated 


for boys and 
girls separately. This table of norms may, 


of course, be ex- 
TABLE 16 


PERCENTILE Norms ror АнїтнмЕтїс Reasoning ТЕзт 
(TABLE 15) OBTAINED FROM SMOOTHED OGIVES іх 


FIGURE 12. 
Girls ; Boys 
Cum. 9/s Ogive Calculated Ogive Calculated 
99 52.0 49.0 57.5 54.5 
95 46.5 44.5 54.5 52.9 
90 43.5 42.7 52.5 50.9 
80 40.0 39.2 49.0 48.1 
70 37.0 36.9 46.5 46.1 
60 35.0 34.6 44.0 44.0 
50 32.5 32.5 41.5 41.8 
40 30.0 30.0 39.0 39.7 
30 27.0 27.5 35.0 34.8 
20 23.5 25.0 30.0 30.9 
0 120 18.0 24.5 25.2 
A 15.5 19. Ў 

1 3.5 3.3 d n 


6.5 14.5 


GRAPHIC METHODS AND PERCENTILES 91 


tended by the addition of other intermediate or extreme 
values. Caleulated percentiles are included in the table for 
comparison with percentiles read from the smoothed ogives. 
These calculated values are useful as a check on the graphically 
determined points, but ordinarily need not be found. 

It is evident that percentile norms read from an ogive are 
not strictly accurate, but the error is slight except at the top 


1004 


90 


80 


Cumulative Percents 


1 
| 
| 
| 
l 
1 
| 
| 
| 
1 
l 
| 
І 
1 
| 
| 
| 
| 
l 
І 
| 
1 


| 

| 

l 

1 

1 
1 І 
nid 
ШІ 
| | 
EN 
Ш 
[Ех 
pH 
пт! 
ШЕ 
ІНЕН 
T 
ит) 


| 
| 
| 
| 
| 
1 
| 
П 
| 
| 
| 
| 
| 
| 
1 
| 
1 
1 


0 45 95 145 195 24.5 29.5 34.5 39.5 44.5 49.5 54.5 595 
Scores 


Тис. 12. Smoothed Ogives of the Scores in Table 15. 


and bottom of the distribution. Estimates of these extreme 
percentiles from smoothed ogives are probably more nearly 
true values than are the calculated points, since the smoothed 
curve represents what we might expect to get from larger groups 
or in additional samplings. . 

The ogives in Figure 12 were smoothed in order to iron out 
minor kinks and irregularities in the curves. Owing to the 
smoothing process, these curves are more regular and con- 
tinuous than are the original ogives in Figure 11. The only 


92 STATISTICS IN PSYCHOLOGY AND EDUCATION 


А ce between the process of smoothing an ogive and 
iie a frequency polygon (p. 16) is that we average 
pm 5 E ercentage frequencies in the ogive instead of actual 
su ds S. P Вене percentage frequencies are given In 'Table 
ome à thed cumulative percent frequency to be plotted 
A m 2165-90-45 100: for 
above 24.5, boys’ distribution, is ———4 — or 10.0; for 
38.0 - 18.0 - 13.0 
=== р — ог 
23.0. Care must be taken at the extremes of the distri- 
bution where the procedure is slightly different. In the boys’ 
distribution, for example, the smoothed cumulative percent 


.1. .0 + 0.0 E 
frequency at 9.5 ig 10-00 009. or 396, and at 59.5, it is 


the same point, girls’ distribution, it is 


шарлар 3 ди: 1000 + 99.0 5.99.7. At 4.5 and 64.5, both of which lie 


outside the boys’ distribution, the cumulative percentage fre- 
100 + 100 + 100 0+0+0 
quencies are 100 [++] апа 0 [=], respec- 


tively. Note that the smoothed ogive extends one interval 
beyond the original at both extremes of the distribution. 
There is little justification for smoothing an ogive which is 
already quite regular or an ogive which is very jagged and ir- 
regular. In the first instance, smoothing accomplishes little if 
anything; іп ће second, it may seriously mislead. A smoothed 
curve shows what we might expect to get if the test or sampling, 
or both, were different (and perhaps better) than they actually 
were. Smoothing should never be a substitute for getting 
additional data or for constructing an improved test. It should 
certainly be avoided when the group is small and the ogive very 
irregular. Smoothing is perhaps most useful when the ogives 
show small irregularities here and there (see Figure 11) which 


may reasonably be assumed to have arisen from small and not 
very important factors. 


GRAPHIC METHODS AND PERCENTILES 93 


IV. OTHER GRAPHICAL METHODS 


Data obtained from many problems in mental measurement, 
especially those which involve the study of changes attributable 
to growth, practice, learning, and fatigue, may be treated 
profitably by graphical methods. Two widely used devices are 
the line graph, frequently found in experimental psychology, 
and the bar diagram more often met with, perhaps, in education. 


wo 
e 


Number of Ideas 
о 
[=] 


БЫ 


to 
© 


х 
8 9 10 11 12 18 14 15 16 17 18 Adults 


Age 


Fic. 13. Logical Memory. Age is represented on X-line (horizontal); 
Score, i.e., number of ideas remembered, on Y-line (vertical). 
(After Pyle.) 


These two methods will be described in this section. For a 
discussion of other graphical methods, the reader is referred to 
books dealing specifically with the subject of graphies.* 


1. The Line Graph 

Figure 13 shows an age-progress curve. This graph repre- 
sents the change in “Jogical memory for a connected passage" 
in boys and girls from eight to eighteen years old. Norms for 
adults are also included on the diagram. Age is represented 
on the horizontal or X-axis and “average number of ideas re- 
produced” at each age level is marked off on the vertical or 


* F 5 a t see Rugg, H. O., A Primer of Graphics and 
вини» p simple Беті More advanced treatments may be found in 
Williams, J. H., Graphic M. ‘ethods in Education, 1924, and Karsten, K, G-- 
Charts and Graphs, 1923. 


94 STATISTICS IN PSYCHOLOGY AND EDUCATION 


/ г ability as measured by this test rises to a 
PRU sep rpm eed groups after which there is a slight 
ies af oe by a rise at the adult level. There is a small 
racemase to sex difference throughout, the girls being higher 
rerage at each age. 
a ha a ке tad learning or practice curve. These 
к show the improvement, in sending and receiving tele- 
graphie messages, resulting from successive trials at the same 


Y 


Sending 


= 
> 
о 


oo 
o 


Receiving 


Letters per Minute 
83 


Y 
o 


0 


4 8 12 16 2 94 98 32 86 49 д 


Weeks of Practice 
Fic. 14. Improvement in Telegraphy. 


Weeks of practice on X-line; 
number of letters per minute on F-line, 
(After Bryan & Harter.) 


48 6 


task over a period of forty-eight wee 


ks. 
measured by 


Improvement аз 
the number of letters sent or received per minute 
is indicated along the Y-axis. Weeks of practice at the given 
task are represented by equal intervals on the X-axis, 

Figure 15 is а performance or practice “curve.” 
sents twenty-five Successive trials with the h 
made by one man and one woman. 


It repre- 


GRAPHIC METHODS AND PERCENTILES 95 


о 
© 


5 


Мап 


Fu 
о 


Grip іп Kgs. 
wo 
© 


Woman 


to 
с 


1 3 5b 7 9 11 18 15 17 19 21 23 25 
Trials 


Fig. 15. Hand Dynamometer Readings in Kilograms for Twenty-five 
Successive Grips at Intervals of Ten Seconds. Two subjects, 
а man and a woman. 


This curve represents memory retention as measured by the 
percentage of the original material retained after the passage 
of different time intervals. The time intervals between learn- 
ing and relearning are laid off on the X-azis; and the percent 
retained, as measured by relearning, on the Y-axis. 


100 
90 
80 
70 
60 
50 
40 
30 
20 
10 


Percent Retained 


1һг, 9hr, 24 hr. 48 hr. 144 hr. 
Time between Learning and Relearning 


Fig. 16. Curve of Retention. The numbers on the baseline give hours 
elapsed from time of learning; numbers along Y-axis 
give percent retained. 


96 STATISTICS IN PSYCHOLOGY AND EDUCATION 


2. The Bar Diagram : А 
Тһе bar graph is sometimes used in psychology to compare 

the relative amounts of some attribute (height, intelligence, 

educational achievement, etc.) possessed by two or more groups. 

In education the bar graph may be used to compare (usually 

in percentage terms) several different variables. Examples 


40 30 20 10 0 10 20 30 40 t5 60 70 80 90 
c 


Fic. 17. Comparative Bar Graphs. The bars represent the 
in each division of the military service receiving 
A's and B's or C's. 


percentage 


ате: the cost of instruction in various schools or in different; 
Counties; distribution of Student time in and out of School; 


teachers’ salaries by states or districts; relative expenditures 
for various Purposes. The со 
that in which a Set of bars is 


GRAPHIC METHODS AND PERCENTILES 97 


represent the percentage of officers in various branches of the 
* military service during World War I who received grades of 4 
and B or C upon the Army Alpha Examination. The bars are 
arranged in order, the group receiving the highest percent of 
A's and B's being placed at the top. It is clear from the diagram 
that the Engineers, who ranked first, received about 95% A's 
| and B's and about 5% C's. The Veterinary Corps, which ranked 
last, received about 60% A's and B's and 40% C's. 
Another illustration of a bar graph is shown in Figure 18. 
The two parallel rectangles or “bars” represent student en- 
*  rollment in two city high schools. Each bar is divided into 
four parts to represent freshmen, Sophomores, juniors, and 
seniors. The size of a division is proportional to the percentage 
which each class is of the whole group. This type of graph is 
often called a divided-bar graph. 


| School A 


Freshmen Sophomores Juniors Seniors 


38% 31% 17% 14% 


School В 


Freshmen Sophomores Juniors |Seniors 


30% 16% 9% 


| Fra. 18. Divided Bar Graphs. The two bars represent student enrollment in 
| two high schools. Each bar is divided into four divisions. _ The length of 
а division show the proportion or percentage of students in that class, 


PROBLEMS 


1. The following distributions represent the achievement of two groups, 


А and B, upon a memory test. 

(a) Plot cumulative frequency graphs of Group A's and of Group 
B's scores, observing the 75% rule. 

(b) Plot ogives of the two distributions 4 and B upon the same axes. 


98 STATISTICS IN PSYCHOLOGY AND EDUCATION 


(c) Determine Pz, Ре, and Рю graphically from each of the ogives 
and compare graphically determined with calculated: values. 

(d) What is the percentile rank of score 55 in Group A's distribu- 
tion? In Group В’; distribution? 

(c) А percentile rank of 70 in Group А corresponds to what per- 
centile rank in Group В? | 

(f) What percent of Group А exceeds the median of Group В? 


Scores Group A Group B 
79-83 6 8 
74-78 7 8 
69-73 8 9 
64-68 10 16 
59-63 12 20 
54-58 15 18 
49-53 23 19 
44—48 16 11 
39-43 10 13 
34-38 12 8 
29-33 6 7 
24-28 3 2 
N = 128 N = 139 
2. Construct an ogive of the following distribution of Scores, 
Seores 7 
159.5 to 169.5 1 
149.5 to 159.5 5 
139.5 to 149.5 13 
129.5 to 139.5 45 
119.5 to 129.5 40 
109.5 to 119.5 30 
99.5 to 109.5 51 
89.5 to 99.5 48 
79.5 to 89.5 36 
69.5 to 79.5 10 
50.5 to 69.5 à 5 
49.5 to 59.5 1 


N = 285 
ve percentages: 
0, 10, 5, and 1. 


Read off percentile norms for the cumulati 
99, 95, 90, 80 70, 60, 50, 40, 30, 2 


GRAPHIC METHODS AND PERCENTILES 99 


3. (а) In accordance with their scores upon a learning test, twenty 
children are ranked in order of merit. Calculate the percentile 
rank of each child. 

(b) If sixty children are ranked in order of merit, what is the per- 
centile rank of the first, tenth, fortieth, and sixtieth? 

4. Given the following data from five cities in the United States, repre- 

sent the facts graphically by means of а bar graph. 


Percent of population which is 


City Native White Foreign-born White Negro 
А -65 :30 :05 
В .60 10 30 
с 50 45 .05 
D 40 20 40 
Б 30 10 -60 
ANSWERS 
Group 4 Group B 
Ogive Cal. Ogive Cal. 
1. (©) Р» 46.0 45.81 48.5 48.69 
Р» 56.0 55.77 59.75 59.85 
Pw 74.0 73.64 75.5 74.81 


(d) 59; 49 

(c) 62 (f) 39-40% of Group A exceed the median of Group В. 
2. Read from ogive: 

Cum. Percents: 99 95 90 80 70 60 50 40 30 
Percentiles: 159 142.5 137.5 131.5 124.5 116.5 107 102 96.5 
20 10 
91 82.5 
3. (a) 97.5; 92.5; 87.5; 82.5; 
47.5; 42.5; 37.5; 32.5; 2 
(b) 99.17; 84.17; 34.17; .83. 


Additional Problems and Questions on Chapters I-IV 


1. Describe the characteristics of those distributions for which the 
mean is not an adequate measure of central tendency. 
2. When is it inadvisable to use the coefficient of variation? 


3. What is a multimodal distribution? 


100 STATISTICS IN PSYCHOLOGY AND EDUCATION 


ites'i me that by the application of eugenics it 
d e inr pli the nu kenn of the race, so that more 
ер would be above the median I.Q. of 100. Comment on this 
s nt. 
5. оа the o of one test usually be compared directly with 
the o of another test? 

6. What effect will an increase in N probably have upon 0? (р. 54.) 

7. What is the difference between a percentile and the ordinary per- 
cent grade used in school? 

8. Does a percentile rank of 65 earned by а given pupil mean that 
65% of the group make scores above him Я that 6595 make the same 
score; or that 6595 make scores below him? 

9. What is indicated by the relatively “flat” portion of an ogive? 

10. Will increasing the size of the class-intervals used in grouping tend 
to make the frequency polygon more irregular? 


11. Calculate the mean, median, mode, Q, and SD for each of the fol- 
lowing distributions: 


(1) Scores f (2) Scores f (3) Scores f 
90-99 2 4-5 3 25 i 
80-89 12 12-13 8 24 2 
70-79 22 10-11 15 23 6 
60-69 — 20 8-9 20 22 8 
50-59 — 14 6-7 10 21 5 
40-49 — 4 4-5 4 20 2 
30-30 — 1 М = 60 19 1 

N = 75 Жы 


12. (a) Plot the distribution in 11 (1) as a frequency poly; 


боп and his- 
togram upon the same coórdinate axes. 
(b) Plot the distribution in 11 
the median, Q;, and Qs. 
Score 12, 


(2) as an ogive. Locate graphically 
Determine the PR of score 9; of 


ANSWERS 
11. 1) Mean = 68.10 


2) Mean = 923 
, Median = 68.75 @) = 


Median = 9.10 

Mode = 70.05 Mode = 8.84 
Q= 901 = 1.69 
SD = 12.50 ee 


GRAPHIC METHODS AND PERCENTILES 101 


(3 Mean = 22.04 
Median — 22.06 
Mode — 22.10 

Q= .91 

SD= 134 


12. (0) Mdn = 9.0; Qı = 7.5; Qs = 11.0 (Read from ogive) 
PR of 9 — 50; of 12 — 84.5 


CHAPTER V 
THE NORMAL PROBABILITY CURVE 


Tum MEANING AND Importance or THE NORMAL 
s PROBABILITY DISTRIBUTION 


1. Introduction 


г diagrams, two polygons and two histo- 
еш D BE : ms беледі di. of data drawn 
Md been prie psychology, and meteorology. It is 
212 upon ‘superficial examination, that all of these 
в Ша um general form — the measures 
cu closely around the center and y 
central high point or crest to bos left gud 
relatively few measures at the low-score 
an inereasing number up to а maximum at mi i- 
tion; and a progressive falling-off toward the high-score 
end of the scale. If we divide the area under each curye (the 
area between the curve and the" X-axis) by a line drawn per- 


ате con- 
per off from this 
right. There are 
end of the scale; 
the middle posi- 


IQ. 60 80 


1. Form L I. 
(from McN, 


100 120 140 
©. distribution апа 


best-fitting normal curve, ages 21 to 18. 
emar, Quinn, The Revision of the Stanford-Binet Scale, p. 19) 
102 


Frequency per Ио inch interval 


THE NORMAL PROBABILITY CURVE 103 


D 


m 
[7] 


Frequencies 
= 
о 


= 


2 4 6 8 10 12 14 16 
Digit Span 


2. Memory span for digits, 123 adult women students. (After Thorndike.) 


н 
to 
e 
5 


Freq. per Inch 
Interval of Stature 
о 
S 


58 60 62 64 66 68 70 72 74 76 78 
Stature in Inches 
3. Statures of 8585 adult males born in British Isles. (After Yule.) 


700 


600 


ү l—— 3445 280 335 309 565 She 
Height in inches 


4. Frequency distribution of barometer heights at Southampton: 
4748 observations. (After Yule.) 


Fic. 19. Frequency Distributions Drawn from Different Fields. 


) - 
104 STATISTICS IN PSYCHOLOGY AND EDUCATION 


А T he central high point to the baseline, 
pea atan ма ek will be similar in shape and very 
ioni ре ат area. It is clear, therefore, that each figure 
nearly Мр se perfect bilateral symmetry. The perfectly 
exhibits Ad curve, or frequency surface, to which all of the 
иже E 19 approximate, is shown in Figure 20. This 
graphs in d T re is called the normal probability curve, or 
etc а curve, and is of great value in mental 
к rement. An understanding of the characteristics of the 
meas . 


Fic. 20. Normal Probability Curve. 
frequency distribution represe 
essential to the student of expe 
measurement. This chapter, therefore, wil 
the normal distribution, and it: 
probability curve. 


nted by the normal curve is 
rimental psychology and mental 
1 be concerned with 
8 frequency polygon, the normal 


2. Elementary Principles of Probability 
Perhaps the sim: 


plest approach to an understanding of the 
normal Probability 


curve is through a consideration of the ele- 


in Statistics, the 

probability" оға given event is defined аз the expected fre- 

quency of occurrence of this event among events of a like 
sort. This expected frequency of occurrence 


THE NORMAL — ША CURVE 105 


upon а knowledge of the conditions deterfnining the occur- 
rence of the phenomenon, as in dice-throwing or coin-tossing, 
or upon empirical data, as in mental and social measure- 
ments. 

Тһе probability of an event may be stated most simply, 
perhaps, as a ratio. We know, for example, that the proba- 
bility of an unbiased coin falling heads is 1/2, and that the 
probability of a die showing a two-spot is 1/6. These ratios, 
called probability ratios, are defined by that fraction the numer- 
ator of which equals the desired outcome or outcomes and the 
denominator of which equals the total possible outcomes, A 
probability ratio always falls between the limits .00 (impossi- 
bility of occurrence) and 1.00 (certainty of occurrence). Thus 
the probability that the sky will fall is 00; that an individual 
now living will some day die is 1.00. Between these limits 
ате all possible degrees of likelihood which may be expressed 
by appropriate ratios. 

Let us now apply these simple principles of probability to 
the specific case of what happens when we toss coins.* If we 
toss one coin, obviously it must fall either heads (H) or tails 
(T) 100% of the time; and furthermore, since there are only 
two possible outcomes, a head or a tail is equally probable. Ex- 
Pressed as a ratio, therefore, the probability of H is 1 2; of 
T 1/2; and 

(Н + T) = 1/2 + 1/2 = 1.00 


If we toss two coins, (а) and (0), at the same time, there are 
four possible arrangements which the coins may take: 


(1) (2) а" is! 
fh fh th gh 


Both coins (a) and (b) may fall H; (a) may fall H and (b) T; 
(b) may fall H and (а) T; or both coins may fall T. Expressed 
аз ratios, the probability of two heads is 1/4 and the probability 


ж С in-tossing and dice-throwing furnish easily understood and often 
used lostrations of the so-called “laws of chance.” 


—— 


) те т 
106 STATISTICS IN PSYCHOLOGY AND EDUCATION 


ility of an HT combination 
ils 1/4. "Also, the probability о 1 с nati 
s Y ken 4 атн cotabination 1/4. And.since it ordinarily 
do : no difference which coin falls H or which falls T, we 
1 1 these two ratios (or double the one) to obtain 1/2 as 
ени өні of an HT combination. The sum of our proba- 
y 7 
bility ratios is 1/4 + 1/2 + 1/4 or 1.00. ее. 
Let us go a step farther and increase the number of coins to 
three Й we toss three coins (a), (b), and (c) simultaneously, 
there are eight possible outcomes: 
(1) (2) (3) (4) (5) (6) (7) 


(8) 
o i mx due Oe ос а ос а БВ 


a bc a bc 
Pee ЕБТ БІН ТЕН ЖТТ TET TÖR Wwoqsp 


Expressed as ratios, the probability of three heads is 1/8 (com- 
bination 1); of iwo heads and one tail 3/8 (combinations 2, 3, 
and 4); of опе head and two tails 3/8 (combinations 5, б, and 7); 
and of three tails 1/8 (combination 8). 'The sum of these 
probability ratios is 1/8 + 3/8 + 3/8 - 1/8 or 1.00. 

By exactly the same method used above for two and for 
three coins, we can determine the probability of different com- 
binations of heads and tails when we have four, five, or any 
number of coins. These various outcomes may be obtained 
in а somewhat more direct way, however, than by writing 
down all of the different combinations Which may occur. If 
there are n independent factors, the probability of the pres- 
ence or absence of each being the same, the “compound” 
probabilities of the appearance of various combinations of 
factors will be expressed by expansion of the binomial (p + 0)". 
In this expression p equals the probability that a given event 
will happen, 4 the probability that the event will not happen, 
and the exponent n indicates the number of factors (e.g. 
operating to produce the final result.* 
р and T for т (tails — non-heads), wi 


‚ coins) 
И we substitute H for 
e have for two coins 


* 4 ч H H + 
We may, for example, consider our coins to be independent factors, 
the occurrence of a head to be the presence of a factor and the occurrence 
of a tail the absence of a factor. 


c € 1 Factors will then be “ present" or “absent” 
in the various_heads-tails combinations, 


THE NORMAL PROBABILITY CURVE 107 


(Н--Т); and squaring, the binomial (Н+ T? = H?+ 
2HT + T*. This expansion may be written, 


1H? 1 chance in 4 of 2 heads; probability ratio = 1/4 
2 HT 2 сһапсеѕіп 4 of 1 head and 1 tail; probability ratio = 1/2 
1T? 1 chance in 4 of two tails; probability ratio =1/4 


Total = 4 


These outcomes are identical with those obtained above by 
listing the three different combinations possible when two coins 
are tossed. 

If we have three independent factors operating, the ex- 
pression (p + 4)" becomes for three coins (Н + Т). Expanding 
this binomial, we get H? + 3H?T + 3H T? + T*, which may be 
written, 


1H? 1 chance in 8 of 3 heads; probability ratio = 1/8 
З HT 3 chances in 8 of 2 heads and 1 tail; probability 

ratio = 3/8 
3 HT? 3 chances in 8 of 1 head and 2 tails; probability 

ratio = 3/ 
1Тз 1 chance in 8 of 3 tails; probability ratio = 1/8 


Total = 8 


Again these results are identical with those got by listing the 
four different combinations possible when three coins are 
tossed. 

Тһе binomial expansion may be applied still more generally 
to those cases in which there are a larger number of independ- 
ent factors operating. If we toss ten coins simultaneously, for 
instance, we have by analogy with the above, (p+ 9). This 
expression may be written (Н + T)", Н standing for the proba- 
bility of a head, T for the probability of а non-head (tail), and 
10 for the number of coins tossed. When the binomial (Н + Т) 


is expanded, the terms are 


Ню + 10H?T + 45H*T? + 120H7T? + 210H*T* + 252H*T* + 210H*T$ 
+ 120H?T* + 45H?T5 + 10H T? - T? 


108 STATISTICS IN PSYCHOLOGY AND EDUCATION 


which may be summarized as follows: Probabilily 


Ratio 
1H» 1 chance in 1024 of all coins falling heads 


10 H?T! 10 chances in 1024 of 9 heads and 1 tail... 
45 H*T? 45 chances in 1024 of 8 heads and 2 tai 
120 H?T? 120 chances in 1024 of 7 heads and 3 ta: 
210 H*T* 210 chances in 1024 of 6 heads and 4 tails. . 
252 Н°Т 252 chances in 1024 of 5 heads and 5 tails. . 
210 H'T* 210 chances in 1024 of 4 heads and 6 tails. . 
120 HT” 120 chances in 1024 of 3 heads and 7 tails. . 
45 H?T* 45 chances in 1024 of 2 heads and 8 tails. . 
10 HT? 10 chances in 1024 of 1 head and 9 tails. . . 
1T» 1 chance in 1024 of all coins falling tails. . 
Total — 1024 


250 


9 
S 


Frequencies 
ка 
e 
о 


а ж B B & XS 
= NN 
ЫН E n B E OM d 
а UM E р & & Б 
- но я = ш = om =. 
s E CE Ss o а o o 4 Не 
а © о а = 2 = я > = С 
= = - a a N E + = mH 


Fro. 21. Probability Surface Obtained from the 

Expansion of (H + Т)ю. 
"These data are represented graphically in Figure 21 by a histo- 
gram and frequency polygon plotted on the same axes. The 
eleven terms of the expansion have been laid off at equal dis- 
tances along the X-azis, and the “chances” of the occurrence 


of each combination of H's and T's are plotted as frequencies 


Frequency 


THE NORMAL PROBABILITY CURVE 109 


on the Y-axis. The result is a symmetrical frequency polygon 
with the greatest concentration in the center and the “scores” 
faling away by corresponding decrements above and below 
the central high point. Figure 21 represents the results to be 
expected theoretically when ten coins are tossed 1024 times. 

Many experiments have been conducted, in which coins were 
tossed or dice thrown a great many times, with the idea of 
checking theoretical against actual results. In one well-known 
experiment,* twelve dice were thrown 4096 times. Each four-, 
five-, and six-spot combination was taken as a “success” and 
1000 


oo 
е 
о 


0 1 2 8 4 6 6 7 8 9 10 1 12 


Theoretical Curve -----Асёаа] Curve 


Fic. 92. Comparison of Observed and Theoretical Results in Throwing 
Twelve Dice 4096 Times. (After Yule.) 
each one-, two-, and three-spot combination as a “failure.” 
Hence the probability of success and the probability of failure 
were the same. In a throw showing the faces 3, 1, 2, 6, 4, 6, 
3, 4, 1, 5, 2, and 3, there would be five successes and seven 
failures. The observed frequency of the different numbers of 
successes and the theoretical outcomes obtained from the ex- 
pansion of the binomial expression (p + 4)? have been plotted 
on the same axes in Figure 22. The reader will note that the 
observed frequencies correspond quite closely to the theoretical 
except for a tendency to shift slightly to the right. If, as an 
experiment, the reader will toss ten coins 1024 times his results 


* Weldon’s experiment; see Yule, G. U., An Introduction to the Theory 
«f Statistics (10th ed., 1932), p. 258. 


110 STATISTICS IN PSYCHOLOGY AND EDUCATION 


will be in close agreement with the theoretical outcomes shown 
E d the discussion in this section, we have taken the 
probability of occurrence (e.g., H) and the probability of non- 
occurrence (non-H or T) of a given factor to be the same. This 
is not а necessary condition, however. For instance, the proba- 
bility of an event's happening. may be only 1/5; of its not 
happening, 4/5. Any probability ratio is possible as long as 
(р + ч) = 1.00. But distributions obtained from the expansion 
of (p + 4)" when p is not equal to g are “skewed” or asymmet- 
rical and are not normal (p. 129). 


3. Use of the Probability Curve in Mental Measurement 

The frequency curve plotted in Figure 21 from the expansion 
of the expression (Н + T)” is a symmetrical many-sided polygon. 
1f the number of factors (e.g., coins) determining this polygon 
were increased from 10 to 20, to 30, and then to 100, say (the 
baseline extent remaining the same), the faces of the 
would increase regularly in number from 23 to 208. W 
increase in the number of factors, the faces of the 
become shorter, and the points on the frequency 
move closer together. Finally, when the 
became very large — when n in the ex 
infinite — the polygon would exhibit а 
like that of the curve in Figure 20. 


"normal" curve represents the frequency of occurrence of vari- 
ous combinations of a very large number of equal, similar, 
independent factors (e.g., coins), when the 
appearance (e.g., H) or non 
is the same. 

If we compare the four graphs plotted from Measures of 
height, intelligence, memor 


i У span, and barometric readings 
in Figure 19, with the normal 


! ili ' in Figure 20, 
the similarity of these diagr д 


polygon 
ith each 
figure would 
surface would 
number of factors 
pression (p + т)" became 
perfectly smooth surface 


This “ideal” polygon or 


and 
probability of the 
-appearance (e.g., Т) of each factor 


THE NORMAL PROBABILITY CURVE 111 


quantitative data to take the symmetrical, bell-shaped form. 
This general tendency may be stated in the form of а “prin- 
ciple” as follows: measurements of many natural phenomena 
and of many mental and social traits under certain conditions 
tend to be distributed symmetrically about their means in pro- 
portions which approximate those of the normal probability 
distribution. 

Much evidence has accumulated to show that the normal 
distribution serves to describe the frequency of occurrence of 
many variable facts with a relatively high degree of accuracy, 
Various phenomena which follow the normal probability curve 
(at least approximately) may be classified as follows: 


1. Biological statistics: the proportion of male to female 
births for the same country or community over a period of 
years; the proportion of different types of plants and animals 
in cross-fertilization (the Mendelian ratios). 

2. Anthropometrical data: height, weight, cephalic index, ete., 
Tor large groups of the same age and sex. 

3. Social and economic data: rates of birth, marriage, or 
death under certain constant conditions; wages and output 
of large numbers of workers in the same occupation under com- 
parable conditions. 

4. Psychological measurements: intelligence as measured by 
standard tests; speed of association, perception-span, reaction- 
time; educational test scores, e.g., in spelling, arithmetie, 
reading. 

5. Errors of observation: measures of height, speed of move- 
ment, linear magnitudes, physical and mental traits, and the 
like, contain errors which are as likely to cause them to deviate 
above as below their true values. Chance errors of this sort 
vary in magnitude and sign and occur in frequencies which 
follow closely the normal probability curve.* 

It is an interesting speculation that many frequency distri- 
butions of scores and other measures are similar to those ob- 


* This topic is treated in Chapter УП. 


112 STATISTICS IN PSYCHOLOGY AND EDUCATION 


i i wing diee because the former, 
bt = ы a distributions. The 
са 1 al егінді distribution, as we have seen, represents 
Ge ur etus of occurrence of the various possible combina- 
ыы Р reat many factors (e.g., coins). In a normal dis- 
m he i f the n factors are taken to be similar, independent, 
a d. strength; and the probability that each will be 
ehe show an H) or absent (e.g., show а Т) is the same. 
| pen seid on à coin of a head or a tail is undoubtedly 
es earn а large number of small (or " chance?) influences 
Mi iae бл one way as another. Тһе twist with which 
bre Sens may be important, as well = es НЕ сым 

n геї the coin, the kind of su 
which aimee VR m al circumstances of a like 
hay В кеше the presence or absence of each one of the 
pao wants of genetic factors which determine the shape of 
a man’s head, or his intelligence, or his personality, may depend 


upon a host of adventitious influences whose net effect we call 
“chance.” 


and probability dis- 


that all distributions 
of mental and physical traits which exhibit а symmetrical form 


binations, The 
let us Say, or mechani- 
е assumption, а priori, 
ons as do the head and 


е success of his 
“normality” of 
the trait being measured.* 


than some other type curve 
* McNemar, Q., The Revision of the Stanford-Binet Scale (1942), Chap- 
ter IT. 


THE NORMAL PROBABILITY CURVE 113 


is sufficiently warranted by the fact that this distribution 
generally does fit the data better, and is more useful. But the 
“theoretical justification and the empirical use of the normal 
curve are two quite different matters.” * 


IL PROPERTIES оғ THE NORMAL PROBABILITY 
DISTRIBUTION 


1. The Equation of the Normal Curve 
The equation of the normal probability curve reads 


Ж с/2т 


(equation. of the normal probability curve) 
in which 
< = scores (expressed as deviations from the mean) laid off 
along the baseline or X-azis. 
У = the height of the curve above the Х. -axis, i.e., the frequency 
of a given a-value or the number achieving a certain score, 


The other terms in the equation are constants: — 

N = number of cases. 

а = standard deviation of the distribution. 

T — 3.1416 (the ratio of the cireumference of a circle to its 
diameter). 

€ — 2.7183 (base of the Napierian system of logarithms). 


When М and c are known, it is possible from equation (17) 
to compute (1) the frequency (or y) of a given value т, i.e., the 
mumber of individuals making a certain score; and (2) the num- 
ber, or percentage, of individuals scoring between two points, 
or above or below a given point in the distribution. But these 
calculations are rarely necessary, as tables are available from 
Which this information may be readily obtained. A knowledge 
of these tables (Tables 17 and 18) is extremely valuable in the 
Solution of a number of problems. For this reason itis very 


* Jones, D. C., A First Course in Statistics (1921), p. 233. 


114 STATISTICS IN PSYCHOLOGY AND EDUCATION 


desirable that the construction and use of Tables 17 and 18 
be clearly understood. 


2. Tables of Areas under the Normal Curve 
reas in terms of с as unit. 
tene the fractional parts of the total area brane 
y mal curve found between the mean and ordinates (y’s) 
me a ut various distances from the mean. In Table 17 dis- 
rants bain the X-axis are measured in с units (see Fig. 20). 
The и aie under the curve (the number of scores in the 
distribution) is taken arbitrarily to be 10,000, Бебииве of the 
greater ease with which fractional parts of the total area may 
— of the table, z/c, gives distances in tenths 
of с measured off on the baseline of the normal curve from 
the mean as origin. We have already learned that &=Х-М, 
1.е., that т measures the deviation of a score X from M. | If x 
is divided by ø, deviation from the mean is expressed in g- 
units. Such o-deviation scores are often called stand, 
or z-scores (z = 2/0). Distances from the mear 
of с are given by the headings of the columns. То find the 
number of cases in a normal distribution between th 
the ordinate erected at a distance of 
the x column until 1.0 is reached, and in the next column under 
:00 take the entry opposite 1.0, viz. 3413. This figure means 
that 3413 cases in 10,000, or 34.13% of the entire area of the 
curve lie between the mean and 16. Put more exactly, 34.13% 
of the cases in a normal distribution fall within the area 
bounded by the baseline of the curve, the ordinate erected at 
the mean, the ordinate erected at a distance of lo from the 
mean, and the eurve itself (s 


ee Fig. 20). To find the percenta 
of the distribution between the mean 


ard scores, 
1 in hundredths 


e mean and 
16 from the mean, go down 


ge 
and 1.570, say, go down 
the т/б column to 1.5, then across horizontally to the column 
headed .07, and take the entry 4418. "This means that in a 
normal distribution, 44.18% of the area (N) lie between the 
mean and 1.570. 


THE NORMAL PROBABILITY CURVE 115 


TABLE 17 


FRACTIONAL PARTS or THE TOTAL AREA (TAKEN AS 10,000) UNDER THE 
Norman PROBABILITY Curve, CORRESPONDING TO DISTANCES ON 
THE BASELINE BETWEEN THE MEAN AND Successive Points Law 
Orr FROM THE MEAN IN UNITS or STANDARD DEVIATION 


Example: between the mean and a point 1,386 (= = 1.38) are found 
41.62% of the entire area under the curve. 
.00 .01 .02 .03 .04 .05 .06 .07 .08 .09 


0000 0040 0080 0120 0160 0199 0239 0279 0319 0355 
0398 0438 0478 0517 0557 0596 0636 0675 0711 0753 
0793 0832 0871 0910 0948 0957 1026 1064 1103 1141 
1170 1217 1255 1293 1331 1368 1406 1443 1480 1517 
1554 1591 1028 1064 1700 1730 1772 1808 1844 1870 
1915 1950 1085 9010 2054 2088 2123 9157 2190 9994 
2257 2901 2324 2357 2389 2 2454 2486 2517 2549 
2580 2611 2612 2673 2704 2764 9704 2823 2852 
2881 2910 2939 2967 2995 3051 3078 3106 3133 
3150 3186 3212 3238 3264 3315 3340 3305 3389 
3413 3438 3401 3485 3508 3531 3554 3577 3599 3621 
3643 3605 3080 3708 3720 3749 3770 3790 3810 3830 
3849 3869 3888 3907 3925 3944 3962 3980 3007 1015 
4032 4049 4066 4082 4099 4115 4131 4147 4162 4177 
4192 4207 4222 4236 4251 4265 4279 4292 4306 4319 
4332 4345 4357 4370 4383 4394 4400 4418 4499 4441 
4452 4403 4474 4484 4495 4505 4515 4525 4535 4545 
4554 4504 4573 4582 4591 4599 4608 4616 4025 4033 
4041 4040 4050 4664 4671 4678 4686 4693 4699 4706 
4713 4719 4726 4732 4738 4744 4750 4756 4761 4767 


4783 4788 4793 4798 . 4803 4808 4812 4817 

4581 4820 4830 4834 4838 4842 4840 4850 4851 4857 
4861 4804 4808 4871 4875 4578 4881 4884 4887 4890 
4803 4890 4898 4901 4904 4906 4909 4911 4913 4916 
4918 4920 4922 4925 4927 4929 4931 4932 4934 4936 
P 1 4943 4945 4946 4948 4949 4951 4952 

4003 4958 4950 4937 4959 4000 4901 4902 4903 4964 
4965 4900 4967 4968 4969 4970 4971 4972 4973 4974 
4974 4975 4076 4077 4977 4978 4979 4070 4080 4981 
4981 4982 4082 4082 4084 4984 4985 4085 4986 4986 


4986.9 4987.4 4087.8 4988.2 4988.6 4988.9 4989.3 4989.7 4990.0 


10003 4990.6 4991.0 4991.3 4991.6 4991.8 4992.1 4992.4 4992.6 4992.9 


4993.129 
4995.166 
4996.031 
4997.074 
4998.409 
4998.922 
4999.277 
4999.519 
4999.683 
4999.966 


4999.997133 


Мо VUN Pees енеме оороо ососо 


© & оооло iisieloo oto mil) ro (Ua оын own SO oleo 


o 


Er fe БО ророросососо оомо 


118 STATISTICS IN PSYCHOLOGY AND EDUCATION 


fall between — 1РЕ and + 1PE measured off from the mean 
(p. 54). "Table 18 cannot be read in as fine units as Table 17, 
only tenths and .05ths PE divisions being given. If smaller 
divisions are desired linear interpolation can readily be made 
with little error. 

Just as we usually disregard that part of a normal curve be- 
yond the limits + Зе, we ordinarily ignore that part of the 
curve beyond the limits + 4PE. There are 9930 (4965 x 2) 
cases in the total 10,000 between the mean and + 4PEZ (Table 
18). Hence, in cutting off the curve at + АРЕ, we lose only 
70 of 1% of the cases in the distribution. 

There is little to choose as between Tables 17 and 18. "Table 
17 admits of easier interpolation but Table 18 is accurate 
enough, without interpolation, for most purposes. Table 17 
is more often used in mental measurement. 


3. Relationships among Constants of the Normal Probability 
Curve 
In the normal probability сигуе, the mean, the median 
the mode all fall exactly at the midpoint of the distribution and 
are numerically equal. Since the normal curve is bilat 
symmetrical, all of the measures of central tendency mu 
on. 


,and 


erally 
St co- 


7 ате found 95% 
ween the mean and 


from the mean; and 99. 
+ Зо from the mean. 


Ав we have seen, + 1PE mark off the middle 50% of the 


THE NORMAL PROBABILITY CURVE 119 


cases, i.e., the 25% of the measures directly above, and the 
25% directly below, the measure of central tendency. Further- 
more, + 2PE include 82.26% of the measures in the distribu- 
tion; + ЗРЕ, 95.7090 of the measures in the distribution; and 
= АРЕ, 99.30% of the measures in the distribution. 
The following constant relations exist among the measures 

of variability : 

1. РЕ = .67450 

2. o =14826PE 


These equations may be verified from the percents of area in- 
cluded by each. Thus, we find by interpolation in Table 17, 
that .67450 (1PE) includes the 25% of the distribution just 
above (or below) the mean; also, from Table 18, that 1.48PEZ 
includes the 84% of the distribution just above (or below) the 
mean. From these formulas it is evident why it was stated 
earlier (p. 59) that c is always greater than Q(PE). 


ПІ. MEASURING DIVERGENCE FROM Normaurry 


1. Skewness 

In a frequency polygon or histogram, usually the first thing 
which strikes the eye is the symmetry or the lack of symmetry 
in the figure. In the normal curve the mean, the median, and 
the mode all coincide and there is perfect balance between the 
right and left halves of the figure. A distribution is said to be 
“skewed” when the mean, the median, and the mode fall at 
different points in the distribution, and the balance (or center 
of gravity) is shifted to one side or the other, to right or left, 
It is important to know (1) whether the skewness which often 
Occurs in distributions of test scores and other measures repre- 
Sents a real divergence from the normal form; or (2) whether 
Such divergence is the result of chance fluctuations, arising 
from temporary causes, and is not significant of real discrepance. 
The degree of displacement or skewness in a frequency dis- 
tribution may be determined by the formula 


120 STATISTICS IN PSYCHOLOGY AND EDUCATION 
SE 3(mean — median) 
c 
(а measure of skewness in а frequency distribution) 


In а normal distribution the mean equals the median and the 
skewness is 0. The more nearly the distribution approaches the 


(18) 


Mean’ Median 
Fic. 23. Negative Skewness: to the Left. 


normal form, the closer together are the mean and the median, 
and the less the skewness. Distributions are said to be skewed 
negatively, or to the left, when the scores are massed at; the high 
end of the scale (the right end), and spread out gradually at 
the low or left end, as shown in Figure 23. Distributions are 
skewed positively, or to the right, when the Scores are massed 


us 


Median Mean 


Fic. 24. Positive Skewness: to the Right. 
at the low (the left) end of the Scale, and spread out gradually 
toward the high or right end as shown in F igure 24. 


yee we apply formula (18) to the distribution of fifty Army 

pha scores in Table 1, page 6, —.28 is obtained as a measure 
of skewness. This result points to a slight negative skewness in 
the data, v 


vhich may be seen by reference to Figure 2, page 14. 


THE NORMAL PROBABILITY CURVE 121 


Formula (18) gives the measure of skewness for the distribu- 
tion of the 200 cancellatión scores (Table 3, p. 14) as .009. 
This negligible degree of positive skewness shows how closely 
this distribution approaches the symmetrical probability form. 
Another measure of skewness is given by the formuia 


iia Fat Pu) uw (19) . 


(a measure of skewness in terms of percentiles) * 


For the normal distribution Si: by formula (19) is zero: Psp 
lies just midway between Ро and Ри. 

Applying this formula to the distributions of fifty Army 
Alpha scores and 200 cancellation Scores, we obtain for the 
first Sk = — 2.50; and for the second Sk = .03. These results 
are numerically different from the measures of skewness ob- 
tained from formula (18), because the two measures of skewness 
ате computed from different reference values in the distribu- 
tion, and hence are not directly comparable. The two formulas 
agree, however, in indicating some negative skewness for the 
distribution of fifty Alpha scores, and an insignificant degree of 
positive skewness for the 200 cancellation scores. In comparing 
the skewness of two distributions we should use either formula 
(18) or (19); not first the one and then the other. 

The important question of how much skewness а distribu- 
tion must exhibit before it may be said to be significantly skewed 
cannot be answered until we have caleulated a “standard error? 
of our measure of skewness. А formula for the standard error 
of Sk, when determined by formula (19), and a method of test- 
ing whether the skewness of a given distribution is significant 


is discussed in Chapter VII, page 220. 


2. Kurtosis 
The term kurtosis refers to the *peakedness" or flatness of a 
frequency distribution as compared with the normal. A fre- 


* Kelley, T. L., Statistical Method (1923), р. 77. Тһе terms in this 
formula, as given by Kelley, have been reversed so that the sign of Sk 
Will agree with the conventional notion of positive and negative skewness. 


122 STATISTICS IN PSYCHOLOGY AND EDUCATION 


uency distribution more peaked than the normal is said to be 
le lokurlic; one flatter than the normal, platykurtic. Figure 
25 shows a leptokurtic distribution and a platykurtic pon 
bution plotted on the same diagram around the same mean. : 

normal curve (called mesokurtic) has also been drawn in on the 


A 


“За -20 =lg 0 *lo +20 +30 
Ға. 25. Leptokurtic (A), Normal or Mesokurtie (B) and 
à Platykurtie (C) Curves. 
diagram to bring out the contrast in the figures, and to make 
comparison easier. A formula for measuring kurtosis is 


Ки- 


0 
(Po — Pio) (20) 
(a measure of kurtosis in terms of percentiles) 


For the normal curve, formula (20) gives Ku = 963.* If 
Ku is greater than 263 the distribution is platykurtic 
than .263 the distribution is leptokurtic. 
kurtosis of the distributions of fifty Alpha scores and 200 сап- 
cellation Scores, discussed above, we obtain Ки = .237 for the 
first distribution, апа Ки = .223 for the Second. Both dis- 
tributions, therefore, аге slightly leptokurtic. To determine 


* From Table 18, we find that РЕ) = 1. = = 
— 1.90. Hence, by formula (20) Я о ыш к 


; if less 
Calculating the 


E 1.00 _ 1.00 
Р 0.90 — (—i95j7 3% = 203 


THE NORMAL PROBABILITY CURVE 123 


whether the kurtosis in a distribution is significant, that is, 
whether the curve is too high or too flat to be treated as sensibly 
normal, we must evaluate Ku in terms of its standard error. А 
formula for the standard error of K и, and а method of deter- 
mining the significance of an obtained measure of Ки will be 
given in Chapter VII, page 220. 


3. Comparing a Given Histogram or Frequency Polygon with 
a Normal Curve of the Same Area, M and с 

In this section methods will be described for superimposing 
on a given histogram or frequency polygon a normal curve of 
the same N, M, and c as the actual distribution. Such a 
normal curve is the “best fitting" normal distribution for the 
given data. "Тһе research worker often wishes to compare his 
distribution *by eye" with that normal curve which ‘best 
fits” the data, and such a comparison may profitably be made 
even if no measures of divergence from normality are com- 
puted. In fact, the direction and extent of asymmetry often 
strike us more convincingly when seen in a graph than when 
expressed by measures of skewness and kurtosis. It may be 
noted that a normal curve can always be readily constructed 
by following the procedures given here provided the area (N) 
and variability (c) are known. 

Table 19 shows the frequency distribution of scores made on 
the Thorndike Intelligence Examination by 206 college fresh- 
men. The mean is 81.59, the median 81.00, and the с 12.14. 
"This frequency distribution has been plotted in Figure 26, and 
Over it on the same axes has been drawn in the best fitting 
normal curve, 1.е., the normal сигуе which best describes these 
data. The Thorndike scores are represented by a histogram 
instead of by a frequency polygon in order to prevent coinci- 
dence of the surface outlines and to bring out more clearly agree- 
ment and disagreement at different points. To plot a normal 
curve over this histogram, we first compute the height of the 
maximum ordinate (yə) or the frequency at the middle of the 
distribution. The maximum ordinate (y.) can be determined 


124 STATISTICS IN PSYCHOLOGY AND EDUCATION 


TABLE 19 


FRE NCY DISTRIBUTION OF THE SCORES MADE BY 206 FRESHMEN 
AHRS ON THE THORNDIKE INTELLIGENCE EXAMINATION 


Scores ў 
115-119 1 
110-114 2 

100-104 10 Mean = 81.59 

95-99 13 Median — 81.00 

90-94 18 с = 12.14 
85-89 34 
80-84 30 
75-79 37 
70-74 27 
65-69 15 
60-64 10 
55-59 2 
50-54 2 
45-49 1 
N;- 206 


from the equation of the n 


ormal curve given on page 113. 
When « in this equation is p 


ut equal to zero (the x at the mean 


of the normal сигуе is 0), the term e: 


equals 1.00, and 
N 

„= == In the present roblem, М = 206: 
ш сут ы; k Р á 


(in units of class-interval), and V27 = 
(see Fig. 26 for calculations). Knowing у, we are 
pute from Table 20 the heights of ordinate: Y stances 
from the mean. The entries in Table i 


the ordinates in the normal probability 
distances fro. 


T = 2.43 * 
2.51; hence Yo = 33.8 


(i.e., 
emoved from J is :60653; or the frequ 
at + Іс is about мые 


18 abou 61% of the maximum frequency at the middle 
of the distribution. In Figure 26 the ordinates + lo from M 
* д = 243 ХБ (interval), 


А E The c in i its i Г 
tion, since the units оп the Халле P Interval units is used in tj 


1 he equa- 
9115 are in terms of class-intervals, 


THE NORMAL PROBABILITY CURVE 125 


21—30) 3c 


219 аз аа 39 мааа aS 
ет гө oz 
DQ 


о эл 
© c 5 = 
WUVHEBSCERES J 


© 
© = 
== 


119.5 


Fic. 26. Frequency Distribution of the Scores of 206 Freshmen on the 
Thorndike Intelligence Examination, Compared with Best-Fitting 
Normal Curve for Same Data. (For data, see Table 19.) 
(Хонмлі, Curve OnprNATES AT MEAN, + lo, + 02, + 3c 

N 206 
Ye > сут 243X 
+ 1с = .60653 X 33.8 = 20. 
20 = 13534 X 33.8 = 
=+ Зо = .01111 33.8 = 
are .60653 X 33.8 (yo)’ or, 20.5. Тһе ordinates + 2¢ from M 
are .13534 X 33.8 or 4.6; and the ordinates + Зо from М 
аге .01111 X 33.8 or .4. 

The normal curve may be sketched in without much diffi- 
culty through the ordinates at these seven points. Somewhat 
greater accuracy may be obtained if various intermediate 
ordinates, for example, at + .5с, + 1.5¢, ete., are also plotted. 
The ordinates for the curve in Figure 26 at + .5¢ аге .88250 
X 33.8 or 29.3; at + 1.50, 32465 X 33.8 or 11.0, ete. 

From formula (19) the skewness of our distribution of 206 
Scores is found to be 1.25. This small value indicates a low 
degree of positive skewness in the data. The kurtosis of the 


= 33.8 


126 STATISTICS IN PSYCHOLOGY AND EDUCATION 


uin 


TABLE 20 


ORDINATES OF THE NORMAL PROBABILITY Curve EXPRESSED AS 
FRACTIONAL Parts OF THE MEAN ORDINATE, Yo 


The height of the ordinate erected at the mean can be computed from 


where V2z = 2.51. The height of any other ordinate, in 


terms of Yo, can be read from the table when one knows the distance which 
the ordinate is from the mean. 


For example: 
at a distance of 1.50g from the mean is .32465 yo 


at a distance of — 2.376 from the mean is .06029 You 


the height of an ordinate 
; the height of an ordinate 


0 


2 


3 


5 


a 3 
6 


7 


8 


9 


100000 
99501 
98020 
95600 
92312 


88250 
83527 
78210 
72615 
66689 


60653 
“54607 
48075 
42956 
87531 


32465 
27804 
23575 
19790 
16448 


13534 
11025 
08892 
07100 
05614 


04394 
03405 
02612 
01984 
01492 


01111 
00034 
00000 


22289 | ан 
AUNO 


Hemmm Hemm сооос 
8 мкоы-с bwon 


коюююю ююююю Ыыыы» 
ооо bwon һоы-о фол 


МЕГЕ 


99980 
,99283 
97609 
95010 
91558 


87353 
82514 
77167 
71448 
65494 


59440 
53409 
47511 
41845 
36487 


31500 
26923 
22782 
19086 
15831 


13000 
10570 
08507 
06780 
05350 


04179 
03232 
02474 
01876 
01408 


00598 
00015 


99955 
99158 
97390 
94702 
91169 


86896 
82010 
76610 
70861 
64891 


58834 
52812 
46933 
41294 
35971 


31023 
26489 
22392 
18741 
15530 


12740 
10347 
08320 
06624 
05222 


04074 
03148 
02408 
01823 
01367 


00432 
00010 


99875 
98881 
96923 
94055 
90371 


2 | 85962 


80957 
75484 
69681 
63683 


57623 
51620 
45783 
40202 
34950 


30082 
25634 
21627 
18064 
14939 


12230 
09914 
07956 
06321 
04973 


"03873 
02986 
02280 


2 | 01723 


01288 


00219 
00004 


99820 
98728 
96676 
93723 
89961 


85488 
80429 
74916 
69087 
63077 


57017 
51027 
45212 
39061 
34445 


29618 
25213 
21251 
17732 
14650 


11981 
09702 
07778 
06174 
04852 


03775 
02908 
02218 
01674 
01252 


00153 
00003 


99755 
98565 
96420 
93382 
89543 


85006 
79896 
74342 
68493 
62472 


56414 
50437 
44644 
39123 
33944 


29158 
24797 
20879 
17404 
14364 


11737 
09495 
07604 
06029 
04734 


03680 
02831 
02157 
01627 
01215 


00106 
00002 


99685 
98393 
96156 
93024 
89119 


84519 
79359 
73769 
67896 
61865 


55810 
49848 
44078 
38569 
33447 


28702 
24385 
20511 
17081 
14083 


11496 
09290 
07433 
05888 
04618 


03586 
02757 
02098 
01581 
01179 


00073 
00001 


99596 
98211 
95882 
92077 
58088 


84060 
78817 
73193 
67298 
61259 


55209 
49260 
43516 
38058 
32954 


28251 
23978 
20148 
16762 
13806 


11259 
09090 
07265 
05750 
04505 


03494 
02684 
02040 
01536 
01145 


00050 
00001 


THE NORMAL PROBABILITY CURVE 127 


distribution by formula (20) is .244, and the distribution ap- 
pears to be slightly leptokurtie (this is shown by the “peak” 
rising above the normal eurve). Neither measure of divergence, 
however, is significant of a “real” discrepancy between our 
data and that of the normal distribution (see p. 220). On the 
whole, then, the normal curve plotted in Figure 26 fits the ob- 
tained distribution well.enough to warrant our treating these 
data as sensibly/normal. 


ІУ. Wnuv Frequency DISTRIBUTIONS DEVIATE FROM 
THE Norma Form 


It is often important for the research worker to know why 
his distributions diverge from the normal form, and this is es- 
pecially true when the deviation from normality is large and 
significant (p. 220). The reasons why distributions exhibit 
skewness and kurtosis are numerous and often complex, but a 
careful analysis of the data will often permit the setting up of 
hypotheses concerning the causes of non-normality which may 
be tested experimentally. The common causes of asymmetry, 
all of which must be taken into consideration by the careful 
experimenter, will be summarized in the present section. 


1. Unrepresentative or Biased Sampling 

Selection is a potent cause of asymmetry. We should hardly 
expect the distribution of I.Q.’s obtained from a group of twenty- 
five ten-year-old boys (all superior students) to be normal; nor 
would we look for symmetry in the distribution of I.Q.’s got 
from a special class of dull-normal ten-year-old boys, even 
though the group were fairly large. Neither of these groups 
is an unbiased selection (1.е., a cross-section) from the popu- 
lation of ten-year-old boys; and in addition, the first group is 
quite small. A small sample is not necessarily unrepresentative, 
but more often than not it is apt to be. 

Selection will produce skewness and kurtosis in distributions 
even when the test has been adequately constructed and сате- 
fully administered. For example, a group of elementary school 


198 STATISTICS IN PSYCHOLOGY AND EDUCATION 


pupils which contains (a) a large proportion of bilinguals, 
(b) many children of very low ог very high socio-economic 
status, (c) a large number of pupils over-age for grade or accel- 
erated, will almost surely return skewed distributions of test 
scores even upon standard intelligence and educational achieve- 
ment examinations. 

Scores made by small and homogencous groups are likely to 
yield leptokurtic distributions; while scores from large and 
heterogeneous groups are more likely to be platykurtic. The 
distribution of scores achieved upon an educational examina- 
tion by pupils throughout the elementary grades, as well as 
the distribution of chronological ages for these same pupils, 
will probably be somewhat flattened owing to the considerable 
overlap from grade to grade. 

Distributions of physical traits, such as height, weight, and 
strength, are also affected by selection. Measurements of 
physieal traits in large groups of the same аре, sex, and race 
will closely approximate the normal form (p. 111). But the 
distribution of height for fourteen-year-old girls in the high 
school of a small city, or the distribution of weight for freshmen 


in a midwestern college will probably be skewed, as these 
groups are subject to selection in various traits related to 
height and weight. 


2. Use of Unsuitable or Poorly Made Tests 
If a test is too easy, 
of the distribution, while if the t 


scores w) 


я Ш be more 
r high scores, 


It is probable also 


THE NORMAL PROBABILITY CURVE 129 


that both distributions will be somewhat more “peaked” 
(leptokurtic) than the normal. 

Asymmetry in cases like these may be explained in terms of 
those small positive and negative factors which determine the 
normal distribution. Too easy a test excludes from operation 
some of the factors which would make for an extension of the 
curve at the upper end, such as knowledge of more advanced 
arithmetical processes which the brighter child would know. 
Too hard a test excludes from operation factors which make 
for the extension of the distribution at the low end, such as 
knowledge of those very simple facts which would have per- 
mitted the answering of a few at least of the easier questions 
had these been included. In the first case we have a number of 
perfect scores and little discrimination; in the second case a 
number of zero scores and equally poor differentiation. Be- 
sides the matter of difficulty in the test, asymmetry may be 
brought about by ambiguous or poorly made items and by 
other technical faults.* 


3. The Measurement of Traits the Distributions of Which Are 


Not Normal 

Skewness or kurtosis or both may also appear owing to a 
real lack of normality in the trait being measured. + Non- 
normality of distribution will arise, for instance, when some of 
the hypothetical factors determining performance in a trait 
are dominant or prepotent over the others, and hence are 
present more often than chance will allow. Illustrations may be 
found in distributions resulting from the throwing of loaded 
dice. When off-center or biased dice are cast the resulting 
distribution will certainly be skewed and probably peaked, 


* Hawkes, Lindquist and Mann, The Construction and Use оў Achieve- 


Exams. (1936), Chapters II and ПІ, 
Mic m телап why all distributions should approach the normal 


form. Thorndike has written: “There is nothing arbitrary or mysterious 
about variability which makes the so-called normal type of distribution a 
necessity, or any more rational than any other sort, Or even more to be 
expected оп a priori grounds. Nature does not abhor irregular distribu- 
tions." — Theory of Mental and Social Measurements (1913), pp. 88-89. 


130 STATISTICS IN PSYCHOLOGY AND EDUCATION 


г likelihood of combinations of faces yield- 
ping to mie g лоны is true of biased coins. Suppose, 
mg ee pm the probability of “success” (appearance of 
БЕ м E ^ = ee the probability of failure (non-occurrence of H, 
4. ans of Т), so that p = 4/5, 4 = 1/5, and (p+ q) = 1.00. 
s ps think of the factors making for success or failure as 3 
ы ташар, we may expand (p+ q)? to find the заана of 
suecess and failure in varying degree. Thus, (p + а) "s * 
Зр?д + Зрф + 9, and substituting p = 4/5 and q = 1/5, we have 


m press frequency 
1 p= (4/5) = === (2) Expressed as а freq 
„ы poa distribution: 
4 , 
Зр? = 3(4/5)- (1/5) = 335 bonus А z 
12 
ӘрФ = 3(4/5)-(1/5)? = 355 в 5 
1 
= (1/5) = —- 0 il 
Ф- (1/5) 198 телі 


The numerators of the probability ratios (frequency of success) 
may be plotted in the form of a histogram to give Figure 27. 

Note that this distribution is negatively skewed (to the left) ; 
that the incidence of three “successes” is 64, of two 48, of one 
12, and of попе 1. J-shaped distributions like these are 
essentially non-normal. Such curves have been most often 
found by psychologists to describe certain forms of social 
behavior. For example, suppose that we tabulate the number 
of students who appear at a lecture “on time”; and the num- 
ber who come in five, ten, and fifteen-plus minutes late. If 
frequency of arrival is:plotted against time, the distribution 
will be highest at zero (“оп time") on the Y-axis and will fall 
off rapidly as we go to the right, i.e., will be positively skewed 
and J-shaped (see Figure 24). If only the early-comers are 
tallied, up to the “оп time” 


group the eurve will be а negatively 
skewed J-curve like those in Figures 23 and 27. J-curves de- 
scribe behavior which is essentially non-normal i 


n occurrence 


THE NORMAL PROBABILITY CURVE 131 


64. 
48 
| 
| 2 
0 І 2 3 
Successes 


Fic. 27. Frequency Polygon of the expansion (p + q)*, where 
= фа = $. pis the probability of success, 
q the probability of failure. 


Fig. 28. U-shaped Frequency Curve. 
because the causes of the behavior differ greatly in strength. 
But J-curves may also represent frequency distributions badly 
skewed for other reasons. We have seen in (1) and (2) above 
that selection and poorly chosen tests can produce distributions 
which closely resemble J-curves. 


132 STATISTICS IN PSYCHOLOGY AND EDUCATION 


True J-curves often occur in medical statistics. The fre- 
quency of death due to degenerative disease, for instance, is 
highest during maturity and old age and minimal during the 
early years. If age is laid ой on the baseline and frequeney of 
death plotted on the Y-axis the curve will be negatively 
skewed and will resemble Figure 23 closely. Factors making 
for death are prepotent over those making for survival as age 
increases, and hence the curve is essentially asymmetrical. 
In the case of a childhood disease, the occurrence of death will 
be positively skewed when plotted against age as the probability 
of death becomes less with increase in age. 

Another non-normal distribution, which may be mentioned 
briefly, is the U-shaped curve shown in Figure 28. U-shaped 
distributions, like J-curves, are probably more often encoun- 
tered in the measurement of social and personality traits than 
in the measurement of mental abilities. Suppose, for instance, 
that the distribution of a large group of college freshmen upon 
an intelligence examination has been drawn up. Now, if the 
proportion in each score category who report more than a 
stipulated number of “neurotic” symptoms is determined, it 
is likely that the high- and low-scoring students will report 
more symptoms than the intermediate-scoring students, Ac- 
cordingly, the curve for symptoms will be U-shaped 
rise at both ends. Again, suppose that all pupils in an elemen- 
tary school below I.Q. 75 and above LQ. 120 are taught in special 
classes. Then, since the total number of such children will 
probably be largest in the low and high grades 
pupils by grades will tend to be U-shaped. 


; will 


; & plot of 


4. The Influence upon Distribution Form of Errors Made in 
the Construction and Administration of Tests 


There are a number of factors besides those already mentioned 
which make for asymmetry in score distributions. Differences 
in the size of the units in which a trait has been measured, for 
example, will lead to skewness. Thus, if the test items are 


very easy at the beginning and very hard later оп, an increment 


THE NORMAL PROBABILITY CURVE 133 


of one point of score at the upper end of the test scale will be 
much greater than an increment of one point at the low end of 
the scale. The effect of such unequal or “rubbery” units is 
the same as that encountered when the test is too easy — 
scores tend to pile up toward the high end of the scale and be 
stretched out or skewed toward the low end. 

Errors in administration of a test as in timing or giving 
instructions; errors in the use of scoring stencils; large dif- 
s in practice or in motivation among the subjects — 
many students to score higher 
uld, will make for asymmetry 


ference 
all of these factors, if they cause 
or lower than they normally wo 
in the distribution. 


PROBLEMS 


1. In two throws of a coin, what is the probability of throwing at 


least one head? 


2. What is the probability of throwing exactly one head in three 


throws of a coin? 
3. Five coins are thrown. What is the probability that exactly two 


of them will be heads? 

4. If the probability of answering a certain question correctly is four 
times the probability of answering it incorrectly, what is the prob- 
ability of answering it correctly? 

ake of alternate routes in order to reach 


5. А rat has five choices to m: ‹ 
the food-box. If it is true that for each choice the odds are two to 


one in favor of the correct pathway, what is the probability that 

the rat will make all of its choices correctly? 

that trait X is completely determined by 6 factors — all 

similar and independent, and each as likely to be present as ab- 
on which one might expect to get from 


sent — plot the distributi 
the measurement of trait X in an unselected group of 1000 people. 


6. Assume 


7. Toss five pennies thirty-two times, and record the number of heads 
and tails after each throw. Plot frequency polygons of obtained 
and expected occurrences On the same axes. Compare the A's 


and o’s of obtained and expected distributions. 


134 STATISTICS IN PSYCHOLOGY AND EDUCATION 


8. What percentage of a normal distribution is included between the 


(а) mean and 1.540 (d) — 3.5PE and 1.0РЕ 
(b) mean and — 2.7PE (е) .66с and 1.780 
(c) - 1.73e and .56с (f — L8PE and — 2.5PE 


9. In a normal distribution 


(a) Determine Ps, Ру, Ры, and Pa in c-units. 


(b) What are the percentile ranks of scores at — 1.230, — .50c, | 
+ .840? 


10. (а) Compute measures of skewness and kurtosis for each of the 

_ four frequency distributions in Chapter IT, Problem 1, page 46. 

(b) Fit normal probability curves to these same distributions, 
using the method given on page 123. 

(c) Foreach distribution, compare the percentage of cases lying be- 

tween + lo with the 68.26% found in the normal distribution. 


ANSWERS | 
1. 3/4 2. 3/8 3. 10/32 4. 4/5 5. 32/243 | 
7. For expected distribution 
М = 2.5, с = 112 
8. (а) .4383 (d) .7409 е 
(b) .4657 (е) .2171 
(с) .6705 (f) .0665 


9. (а) — .61e, — -10c, .10с, .88с 
(b) 11, 31, 80 


10. (a) Skewness Kurtosis 
By formula (18) By formula (19) By formula (20) 
(1) - .018 — .239 
(2) 156 1.03 277 
(8) 071 .55 .222 
(4) .032 — 0 248 


(c) 66%, 67%, 66%, 66% 


CHAPTER VI 


APPLICATIONS OF THE NORMAL PROBABILITY 
CURVE 


I. PROBLEMS INVOLVING PROPORTIONS OF AREA WITHIN 
DIFFERENT Parts or THE NORMAL DISTRIBUTION 


Тнів section will consider a number of problems which may be 
readily solved if we can assume that the distributions of scores 
with which we are dealing may be treated as normal, or at least 
as approximately normal, in form. Each general problem will 
be illustrated by several examples. These examples are in- 
tended to present the issues coneretely, and should be carefully 
Worked through by the student. Constant reference will be 
made to Tables 17 and 18; and a knowledge of how to use these 


tables is essential. 


1. To Determine the Percentage of Cases in a Normal Dis- 
tribution Which Fall within Given Limits 
Example (1) Given а normal distribution with a mean of 


12, anda e of 4. (а) What percentage of the cases fall be- 


tween S and 16? (b) What percentage of the cases lie above 


: 18? (c) Below 6? 

(a) A score of 16* is four points above the mean, and a score 
of 8 is four points below the mean. If we divide this scale 
distance of four score units by the © of the distribution (i.e., 
by 4) it is clear that 16 is 10 above the mean, and that 8 is 10 
below the mean (see Fig. 29, P- 136). "There are 68.26% of the 
cases in a normal distribution between the mean and + lo 
(Table 17). Hence, 68.26% of the scores in this distribution, or 
approximately the middle two-thirds, fall between 8 and 16. 

ы idpoint of the interval 15.5 to 16.5. 
135 


* A score of 16 is the mi 


136 STATISTICS IN PSYCHOLOGY AND EDUCATION 


This result may also be stated in terms of “chances.” Since 
68.26% of the cases in the given distribution fall between 8 and 
16, the chances are about 68 in 100 that any score in the dis- 
tribution will be found between these limits. 

(b) A score of 18 is six score units, ог 1.50 above the mean 
(6/4 = 1.5). From Table 17 we find that 43.32% of the cases 
in the entire distribution fall between the mean and 1.50. Ас- 
cordingly, 6.68% of the cases (.5000-.4332) must lie above 18, 
in order to fill out the 50% of cases in the upper half of the curve 


6 8 12 16 18 
Mean 
Fic. 29. 


(Fig. 29). Stated in terms of chances, there are 668 chances in 
10,000, or about 7 in 100, that any score in the distribution will 
lie above 18. 

(c) A score of 6 is — 1.56 from the mean. Between the mean 
and a score of 6(— 1.50) are 43.32% of the cases in the whole 
distribution. Hence, about 7% of the cases lie below 6 (fill out 
the 50% below the mean), and the chances are 7 in 100 that 
any score in the distribution will fall below 6. 

Example (2) Given a normal distribution with a mean of 
29.75, and a Q of 4.56. What percentage of the distribution 


lie between 22 and 26? What are the chances that a score will 
fall between 22 and 26? 


APPLICATIONS OF NORMAL PROBABILITY CURVE 137 


In a normal distribution Q = PE. А score of 22 is 7.75 units, 
am 1.70PE (7.75/4.56 = 1.70) from the mean; and a score of 
26 is 3.75 or — .82PE from the mean (Fig. 30, below). From 
Table 18 we know that 37.42% of the cases in a normal distri- 
bution lic between the mean and — 1.7028; and that 20.99% 
(by interpolation) of the cases lie between the mean and 
—.82PE. By simple subtraction, therefore, 16.43% of the 
cases fall between — 1.70PE and — .S2PE or between 22 and 


26. The chances are about 16 in 100 that a score will fall be- 


tween 22 and 26. 


-4E 


d Fi 
22 26 2915 
Mean 


Tic. 30. 


Normal Distribution Which Will 
f the Cases 


bution with a mean of 
Il include the middle 


2. To Find the Limits in Any 

Include a Given Percentage 0 

Example (1) Given a normal distri | 

16.00 and а о of 4.00. What limits wi 
75% of the cases? 

sina normal distribution must 


, The middle 75% of the case aal dist 
and the 37.5% just below the 


Include the 37.5% just above; : 
Mean, From Table 17 we find that 3749 cases in 10,000, or 


37.5% of the distribution, fall between the mean and 1.150; 
and, of course, 37.5% of the distribution also fall between the 
Mean and — 1.150. The middle 75%, of the cases, therefore, 


138 STATISTICS IN PSYCHOLOGY AND EDUCATION 


lie between the mean and + 1.150; or, since б = 4.00, between 
the mean and + 4.60 score units. Adding + 4.60 to the mean 
(to 16.00), we find that the middle 75% of the scores in the given 
distribution lie between 20.60 and 11.40 (see Fig. 31, below). 
Example (2) Given a normal distribution with a median of 
150.00 and a 0 of 26.00. What limits will include the highest 
20% of the distribution? The lowest 10%? 


1140 16.00 20.60 
а = 4.00 


Fic. 31. 


Тһе highest 20% of a normally distributed group will have 
30% of the cases between its lower limit and the median, since 
50% of the cases lie in the right half of the distribution. From 
Table 18, we know that 3004 cases in 10,000, or 30% of the 
distribution, fall between the median and 1.25PZ. Since the 
РЕ of the given distribution is 26.00, 1.25P E will be 1.25 x 26.00 
or 32.5 points above the median, namely, at 182.5. Тһе lower 
limit of the highest 20% of the given group, therefore, is 182.5; 
and the upper limit is the highest score in the distribution, 
whatever that may be. 

The lowest 10% of a normally distributed group will have 
40% of the cases between the median and its upper limit. 
Exactly 4095 of the distribution fall between the median and 
— 1.90PE. Тһе РЕ of the given distribution is 26.00; hence, 


APPLICATIONS OF NORMAL PROBABILITY CURVE 139 


— 1.90PE will be 1.90 х 26.00 or 49.4 score units below the 
median, that is at 100.6. The upper limit of the lowest 10% of 
scores in the given group, therefore, is 100.6; and the lower 
limit is the lowest score in the distribution. 


3. To Compare Two Distributions in Terms of “Overlapping” 


Example (1) Given the distributions of the scores made on a 
logical memory test by 300 boys and 250 girls (Table 21). 
The boys’ mean score is 21.49 with a с of 3.63. The girls’ 
mean score is 23.68 with a о of 5.12. The medians are: boys, 
21.41, and girls, 23.66. What percentage of boys exceed the 
median of the girls' distribution? 

On the assumption that these distributions are sensibly 
normal, we may solve this problem by means of Table 17. Тһе 
girls’ median is 23.66 — 21.49 or 2.17 score units above the boys' 
mean. Dividing 2.17 by 3.63 (the c of the boys' distribution), 
we find that the girls’ median is 60r above the mean of the 
boys’ distribution. Table 17 shows that 23% of a normal 
distribution lie between the mean and .60c; hence 27% of the 
boys (50% — 23%) exceed the girls’ median. | 

This problem may also be solved by direct calculation from 
the distributions of boys’ and girls’ scores without any assump- 
tion as to normality of distribution. The calculations are shown 
in Table 21; and it will be interesting to compare the result 
found by direct calculation with that obtained by use of the 
probability tables. The problem is to find the number of boys 
whose scores exceed 23.66, the girls’ median, and then turn this 


В ке are 217 boys who score up to 

number into a percentage. There are 217 Бо) e 

23.5 (lower limit of 23.5 to 27.5). The class-interval 23.5 to 27.5 
e 68/4 or 17 scores per scale 


contains ев; hence there ar : 

unit 9 рена We wish to reach 23.66 in the boys' 
distribution. This point is .16 of а score (23.66 — 23.50 = .16) 
above 23.5, or 2.72 (ie, 17 X 16) score units above 23.5. 
Adding 279 to 217, we find that 219.72 of the boys scores fall 
below 23.66, the girls’ median. Since 300 = 219.72 = 80.28, it is 
Clear that 80.28 = 300 or 26.76% (approximately 27%) of the 


140 STATISTICS IN PSYCHOLOGY AND EDUCATION 


TABLE 21 


To ILLUSTRATE THE METHOD or DETERMINING OVERLAPPING 
ву Direct CALCULATION FROM THE DISTRIBUTION 


Boys Girls 
Scores y Scores f 

27.5 to 31.5 15 
23.5 to 27.5 68 
19.5 to 23.5 128 
15.5 to 19.5 79 
11.5 to 15.5 10 
N = 300 
N/2 = 150 

Mdn = 19.5 + v's X 4 Mdn 

= 21.41 
M = 21.49 M 
с = 3.63 c 


What percent of the boys exceed 23.66, the median of the girls? First, 
217 boys make scores below 23.5. Тһе class-interval 23.5-27.5 contains 68 
scores; hence, there are 68/4 or 17 scores per scale unit on this interval 

The girls’ median, 23.66, is .16 above 23.5, lower limit of interval 23.5- 
27.5. If we multiply 17 (number of scores per scale unit) by .16 we obtain 
2.72 which is the distance we must go into interval 23.5-27.5 to reach 23.66 

Adding 217 and 2.72, we obtain 219.72 as that part of the boys’ distri- 
bution which falls below the point 23.66 (girls’ median). М is 300; hence 
300-219.72 gives 80.28 as that part of the boys’ distribution which lies 


above 23.66. Dividing 80.28 by 300, we find th у d 
27%, of the boys exceed the girls’ А а ыны шаа ы 


boys exceed the girls’ median. This result is in almost perfect 
agreement with that obtained above. Apparently the as- 
sumption of normality of distribution for the boys’ scores d 
justified. 

Тһе agreement between the percentage of overlapping found 
by direct calculation from the distribution, and that found b 
use of the probability tables will nearly always be close e 
pecially if the groups are large and the distributions fairly рий 
metrical. When the overlapping distributions are small and 
not very regular in outline, it is safer to use the method of 
direct caleulation since no assumption as to form of distribution 
is then made. 


APPLICATIONS OF NORMAL PROBABILITY CURVE 141 


4. To Determine the Relative Difficulty of Test Questions, 

Problems, and other Test Items 

Example (1) Given a test question or problem solved by 
10% of a large unselected group; a second problem solved by 
2097, of the same group; and a third problem solved by 30%. 
If we assume the capacity measured by the test problems to be 
distributed normally, what is the relative difficulty of ques- 
tions 1, 2, and 3? 


-30 

.52e .84e 1284 

Fic. 32. 

d for Question 1 a position in the dis- 
of the entire group (the percent 
(the percent failing) lie below the 
given point. The highest 10% in à normally distributed group 
has 40% of the cases between its lower limit and the mean 
(see Fig. 32, above). From Table 17 we find that 39.97% 
(1.е., 40%) of a normal distribution fall between the mean and 
1.280. Hence, Question 1 belongs at a point on the baseline 
of the curve, a distance of 1.280 from the mean; and, accord- 
ingly, 1.280 may be set down as the difficulty value of this 


question. | falls ЖЕК 
Question 2, passed by 20% of the num a дез a point in 
the distribution 30% above the mean. From Table 17 it is 


Our first task is to fin 
tribution, such that 10% 
passing) lie above, and 90% 


142 STATISTICS IN PSYCHOLOGY AND. EDUCATION 


found that 29.95% (1.е., 30%) of the group fall between the 
mean апа .84о; hence, Question 2 has a difficulty value of .84c. 
Question 3, which lies at a point in the distribution 2075 above 
the mean, has a difficulty value of .520, since 19.85% of the 
distribution fall between the mean and .52с. То summarize 
our results: 


Question Passed by c-value c-difference 
1 10% 1.28 — 
2 20% 184 44 
3 30% 52 32 


The o-difference in difficulty between Questions 2 and 3 is .32, 
which is roughly 3/4 of the o-difference in difficulty between 
Questions 1 and 2. Since the percentage difference is the same 
in the two comparisons, it is evident that when ability is as- 
sumed to follow the normal distribution, с and not percentage 
differences are the better indices of differences in difficulty. 


Example (2) Given three test items, 1, 2, and 3, passed by 
50%, 40%, and 30%, respectively, of a large group. On the 
assumption of normality of distribution, what percentage 
of this group must pass test item 4, in order for it to be as 
much more difficult than 3, as 2 is more difficult than 1? 


An item passed by 50% of a group is, of course, failed by 
50%; and, accordingly, such an item falls exactly in the middle 
of a normal distribution of “difficulty.” Test item 1, therefore 
has а o-value of .00 since it falls exactly at the mean (F ig. 33). 
Test item 2 lies at a point in the distribution 10% above the 
mean, since 40% of the group passed, and 60% failed, this item. 
Accordingly, the o-value of item 2 is .25, since from Table 17 
we find that 9.87% (roughly 10%) of the cases lie between the 
mean and .25с. Test item 3, passed by 30% of the group, lies 
at a point 20% above the mean, and this item has a difficulty 
value of .520, as 19.85% (20%) of the normal distribution fall 
between the mean and .526. 

Since item 2 is .250 farther along on the difficulty scale (to- 


APPLICATIONS OF NORMAL PROBABILITY CURVE 143 


.25а .520 170 
Fic. 33. è 


ward the high-score end of the curve) than item 1, it is clear 
that item 4 must be .25c above item 3, if it is to be as much 
harder than item 3 as item 2 is harder than item 1. Item 4, 
therefore, must have a value of .520 + .250 or 7 76; and from 
Table 17 we find that 27.94% (28%) of the distribution fall 
between the mean and this point. This means that 50% - 28% 
or 22% of the group must pass item 4. To summarize: 


Test Item Passed by g-value c-difference 
1 50% .00 — 
2 40% 25 25 
30% 92 cT 
н 220, 77 25 


A test item, therefore, must be passed by 22% of the group 
in order for it to be as much more difficult than an item passed 
by 30%, as an item passed by 40% is more difficult than one 
passed by 50%. Note again that percentage differences are not 
reliable indices of differences in difficulty when the capacity 


Measured is distributed normally. 


144 STATISTICS IN PSYCHOLOGY AND EDUCATION 


b. To Separate a Given Group into Sub-Groups According to 
Capacity, When the Trait Measured Is Assumed to be 
Normally Distributed 


Example (1) Suppose that we have administered a certain 
examination to 100 college students. We wish to classify our 
group into five sub-groups A, B, C, D, and E according to 
ability, the range of ability to be equal in each sub-group. On 
the assumption that the trait measured by our examination is 
normally distributed, how many students should be placed 
in groups A, D, C, D, and E? 


Гус. 34. 


Let us first represent the positions of the five sub 
diagrammatieally on a normal curve as shown in Figure 34, 
above. If the baseline of the curve is considered to extend from 
— Зо to + Зс, that is, over а range of 6c, dividing this range 
by 5 (the number of sub-groups) gives 1.26 as the baseline ex- 
tent to be allotted to each group. These five intervals may be 
laid off on the baseline as shown in the figure, and perpendiculars 
erected to demarcate the various sub-groups. Group A covers 
the upper 1.20; group B the next 1.20; group C Нез бо to the 
right and .6c to the left of the mean; groups D and E occupy 


-groups 


APPLICATIONS OF NORMAL PROBABILITY CURVE 145 


the same relative positions in the lower half of the curve that 
В and А occupy in the upper half. 

To find what percentage of the whole group belongs in A we ' 
must find what percentage of a normal distribution lies between 
Зо (upper limit of the А group) and 1.86 (lower limit of the А 
group). From Table 17 49.86% of a normal distribution is 
found to lie between the mean and Зе; and 46.41% between 
the mean and 1.30. Hence, 3.5% of the total area under the 
normal curve (49.86% — 46.41%) lie between Зе and 1.86; and, 
accordingly, group A comprises 3.5% of the whole group. 

The percentages in the other groups are calculated in the 
same way. Thus, 46.41% of the normal distribution fall be- 
tween the mean and 1.86 (upper limit of group B) and 22.57% 
fall between the mean and .бо (lower limit of group В). Sub- 
tracting, we find that 46.41% — 22.57% or 23.84% of our dis- 
tribution belongs in sub-group B. Group C lies from .6¢ above 
to —.бо below the mean. Between the mean and .бо are 
22.57% of the normal distribution, and the same percent lies 
between the mean and — .60. Group C, therefore, includes 
45.14% (22.57 X 2) of the distribution. Finally, sub-group D 
Which lies between — .бо and — 1.80 contains exactly the same 
Percentage of the distribution as sub-group B; and group E, 
Which lies between — 1.86 and — Зс, contains the same percent 
of the whole distribution as group A. The percentage and 
number of men in each group are given in the foliowing table: 

Groups 
А В C D E 
3.5 23.8 45 238 3.5 


Per alin each gn 
ercent of total in each group int ds E. dec 


umber in each group 
(100 men in all) 


On the assumption that the capacity measured follows the 
normal curve, it is clear that three to four men in our group 
of 100 should be placed in group А, the шаап ability group; 
twenty-four in group В, the “high average" ability group; 
forty-five in group C, the “average” ability group; twenty-four 


146 STATISTICS IN PSYCHOLOGY AND EDUCATION 


in group D, the *low average" ability group; and three or 

four in group E, the “very low” or “inferior” group. 
` Тһе above procedure may be used to determine how many 
students in a class should be assigned to each of any given 
number of grade-groups. It must be remembered that the 
assumption is made that performance in the subject matter 
upon which the individuals are being marked is represented by 
the normal curve. The larger and more unselected the group 
the more nearly is this assumption justified. 


IL Tue Scania оғ Тезт Irems 


1. The Arrangement of Test Items into a Scale in Which the 
Difficulty of Each Item Is Known with Reference to an 
Arbitrary Zero Point | 

The psychologist often wishes to construct scales which shall 
contain problems or questions graded in difficulty from very 
easy to very difficult by known steps or intervals, Given a set 
of problems or test items, if we know what proportion of a large 
group passes each problem it is comparatively easy to arrange 
the problems in a percentage order of difficulty. Such an ar- 
rangement constitutes a “scale,” to be sure; but it is 
crude scale, since we know only roughly the steps in di 
from item to item. 

In constructing scaled tests, the o or PE of the distribution, 
rather than the percent passing, is taken as the unit of measure- 
ment. When the variability of the group is employed as a 
scaling unit, we are able not only to arrange test items in order 
of difficulty but to “set” or space them at definite points along 
a difficulty scale. To illustrate how test items are scaled when 
the unit of measurement is the ¢ or PE of the group, let us sup- 
pose that we wish to construct a scale for measuring “reasoning 
ability" (e.g., by means of Syllogisms) in twelve-year-old chil- 
dren; or a test of arithmetic problems for Grade IV; or a scale 
for testing sentence memory in eight-year-old children. Тһе 
successive steps involved in constructing such a scale may be 
outlined as follows: 


а very 
fficulty 


APPLICATIONS OF NORMAL PROBABILITY CURVE 147 


) First compile a large number of problems or other test. items. 
These items should vary in difficulty from very easy to very 
hard and should be representative of the field covered 
by the test. 


(2) Administer the items or problems to as large and as ran- 


(4 


domly selected a group as can be assembled from among 

those for whom the test is eventually intended. 
) Compute the percentage of the group solving each problem 
correctly, Duplicate items and those too easy or too hard 
or unsatisfactory for one reason or another should be dis- 
carded. The problems retained for the scale are then ar- 
ranged in order of percentage difficulty. A problem solved 
correctly by 90% of the group is obviously less difficult than 
one solved correctly by 75%; while the second problem is, 
in turn, clearly less difficult than one solved correctly by 
50%. Тһе greater the percentage passing an item, the 
lower the position of this item in a scale of difficulty. 
Зу means of Table 18 convert the percentage solving each 
problem correctly into РЕ distances above or below the 
mean.* The procedure in detail is аз follows: A problem 
solved correctly by 40% of the group is 10% or about .40P2 
above the mean. А problem solved correctly by 78% of the 
group is 28% (78% — 50%) ог 145PE below the mean. 
We may tabulate the results for five items, selected at 
random, as follows (see Fig. 35, below): 


— 


Problems A B с _ D E 
Percent ВОИ „у ша сы» 93 78 55 40 14 
istance from mean in per- 
Centage terms.......... — 43 — 28 -5 10 86 
iste in PE 
ince from mean in Р. Lam ЖЫЙ = dí og 


? he group, i.e., by the 
Problem A is solved by 93% of t 9:55 
Upper 50% (the right half of the curve) plus the 43% to the 
left of the mean. This puts problem A at a point — 2.20РЕ 


* The procedure is identical when с is employed instead of РЕ. 


148 STATISTICS IN PSYCHOLOGY AND EDUCATION 
C 


П 
1 
1 
| 
1 
| 
1 
1 
1 
1 
| 
| 
| 
| 
1 
| 
1 
| 
1 


- 4РЕ  СО3РЕ --2РЕ Тре 
—— 1 


-245 -220 -115 -20 .40 
Кіс. 35. 


| 
| 
І 
І 
| 
1 
| 
| 
| 
| 
| 
| 
| 
| 
! 
| 
1 
| 


І 
| 
! 
1 
р 
! 
| 
| 
Ipe | 
+ 
i 


from the mean. In the same way, the percentage distance 
of each problem from the mean (measured in the plus or 
minus direction) is found by subtracting the percentage 
passing from 50%. From these percentages, the PE dis- 
tance of the problem above or below the mean can be read 
from Table 18.* 


(5) When the P# distance of each problem above or below the 
mean has been established, calculate the РЁ distance of 
each problem from the “zero point" of ability in the test. 
A zero point may be located in the following way: Suppose 
that 5% of the whole group fails to solve a single problem 
correctly. This would put the level of zero ability in this 
test 45% of the distribution below the mean, or at a 
point — 2.45PE from the теап.ї The PE distance of each 
problem in the seale may now be caleulated from this 


* PE's are taken for the percentage nearest to the given value, without 
interpolation. | : 

1 This value is an arbitrary, not a true, zero. It serves, however, as a 
convenient reference point (point of minimum ability) from which to 
measure performance. The points — 4.00PE or — 3.000 are also con- 
venient reference points. 


APPLICATIONS OF NORMAL PROBABILITY CURVE 149 


arbitrary zero point. То illustrate with the five problems 


above: 
Problems A B с D E 
PE distance from mean. ....... - 2.20 — 1.15 —.20 40 1.60 
PE distance from arbitrary zero, 
1:6: == АБР „онан ыннан 25 1.30 2.25 2.85 4.05 


The simplest method of finding РЁ distances from a given 
zero is to subtract the zero point algebraically from the 
PE distance of each problem from the mean. Problem А, 
for example, is — 2.20 — (— 2.45) or .25PE from the arbi- 
trary zero point; and Problem E is 1.60 — (— 2.45) or 
4.05PE from the zero point. The РЕ value of each of the 
other problems, as measured from the arbitrary zero point, 
is found in the same way. When the PE value-from-zero 
of each of the problems intended for the test has been 
determined, the difficulty value of each problem with 
respect to every other problem, as well as with respect to 
the arbitrary zero, is known, and the scale is finished. 


2. Scaling Total Scores on a Test 
(1) Normalizing a Frequency Distribution: The 7-8еаје 


In the last section we saw how separate test items are scaled 


in PE-units on the assumption of normality in the trait meas- 
ured. We shall now describe a method of scaling score totals or 
aggregates of items — a procedure usually followed in standard 
educational achievement tests. кН 

Тһе method consists essentially in “normalizing” the distri- 
bution of test scores. This is done by transforming original 
test scores into equivalent scores in a normal вика 
Equivalent scores are defined as measures which indicate the 
same levels of ability. Suppose that in a given test 847% of 
the group score below 124. Then 124 is equivalent to a score 
of + 10 in a normal distribution, since 847% (approximately) 
of а normal distribution fall below (to the left of) + 1c. As we 
shall see la ter, normalizing а distribution of test scores alters 


150 STATISTICS IN PSYCHOLOGY AND EDUCATION 


the original test units (stretching them out or compressing them) 
and the more skewed the original distribution the greater the 
change in unit. 

Тһе obtained scores of a distribution may be transformed into 
various systems of “new” or normalized scores. The method 
outlined in this section leads to a normalized system of scores 
called T-scores. T-scaling was devised by McCall* and first 
used by him in constructing a series of reading tests designed 
for use in the elementary grades. Тһе original 7-scale was 
drawn up from the scores achieved by 500 twelve-year-olds 
upon a reading test; and the scores made by other age groups 
on these tests were expressed in terms of twelve-year-old T- 
scores. Since the first use of the method, T-scaling has been 
employed with various groups and no longer has reference 
specifically to twelve-year-olds or to reading tests. 

Procedure in 7-scaling can best be shown by an example. 
We shall outline the process in a series of steps and illustrate 
each step by reference to the data in Table 22. 


TABLE 22 


То ILLUSTRATE THE CALCULATION or T-SconEs 


(1) (2) (3) (4) (5) 


(6) 
Test Cum: Cum. Freq. below Col. (4 
Or Score + $ Jol. (4) T 
Score ў ў i d 295 in %% T-Scores 
10 " 62 61.5 99.2 74 
9 A 6l 59 95.2 67 
8 6 57 54 87.1 61 
7 0 51 46 74.2 56 
$ 3 4 En 59.7 52 
5 15 38 26.5 42.7 48 
4 18 20 11 17.7 41 
3 2 2 1 1.6 28 
М = 62 


(1) Compile a large and representative group of test items which 
vary in diffieulty from easy to hard. Administer these 


* McCall, W. A., How to Measure in Education (1929), Chapter X, 
pp. 272-306. 


(2 


(3 


— 


«> 


APPLICATIONS OF NORMAL PROBABILITY CURVE 151 


items to a sample of subjects (children or adults) for whom 
the test is intended eventually. 

Compute the percent passing each item. These percents 
may be converted into c-units so that the items selected 
for inclusion in the final test are arranged in order of diffi- 
culty in terms of с. Since a precise measure of relative 
difficulty is not important at this stage, however, items in 
the final test may be arranged simply in order of percentage 
difficulty (number passing). 

Administer the final test to а representative sample and 
tabulate the distribution of scores. Total scores may be 
scaled as shown in Table 22 for a group of sixty-two sub- 
jects. In column (1) of Table 22 the test scores are entered. 
In column (2) are the frequencies, i.e., numbers of subjects 
who achieve various scores. Two subjects, for example, 
had scores of 3, 18 scores of 4, 13 scores of 5, and so on. In 
column (3) scores have been cumulated (p. 74) from the 
low to the high end of the frequency distribution. Column 
(4) shows the number of subjects who fall below each score 
plus one half of those who achieve the given score. The 
entries in this column may be computed readily from 
columns (2) and (3). Since there are no scores below 3 and 
two scores on 3, the number below plus one-half on 3, is 1. 
There are two scores below 4 [column (3)] and eighteen on 
4 [column (2)], hence the number below plus one-half on 4 is 
9+2 orll. There are twenty scores below 5 [column (3)], 
and део scores on 5 [column (2)], hence the number 
below plus one-half on 5 is 20 + 6.5, or 26.5. One-half of 
the frequency on a given score must be added to the number 
of scores falling below the score because a score is an interval, 
not a point. The score of 4, for example, is the interval 3.5 
to 4.5, mid-point 4.0. If the eighteen frequencies on 4 are 
thought of as distributed evenly over the interval, nine will 
lie below and nine above 4.0, the midpoint. Hence, if we add 
nine to the two scores below 4 (i-e., below 3.5), we obtain 
eleven as the number of scores below 4.0, the midpoint of the 


152 STATISTICS IN PSYCHOLOGY AND EDUCATION 


interval 3.5 to 4.5. Each sum in column (4) is up to the 
mid-point of a score-interval. 

In column (5) the entries in column (4) are expressed as 

percentages of № (62). Thus 99.2% of the scores lie below 
10.0, midpoint of 9.5 to 10.5; 95.2% of the scores lie below 
9.0, etc. 
Turn the percents in column (5) into 7-scores by means of 
Table 23. Т-всогев (to two places) in Table 23 correspond- 
ing to percentages nearest those wanted are taken without 
interpolation, as fractional 7-scores are a needless refine- 
ment. Thus 1.39 (7-ѕсоге = 28) is taken for 1.6; 18.41 
(T-score — 41) for 17.7, and so on. 


(4 


= 


Figure 36 shows a histogram plotted from the distribution 
of the sixty-two scores in Table 22. Note that the scores 3, 4, 


18 


13 


6 


2 
34 5 6 17 8 9 ио 


Гіс. 36. Histogram of the Sixty-two Scores in Table 22. 


5, etc., are spaced at equal intervals along the baseline, i.e., 
along the scale of scores. When these scores are transformed 
into equivalent normal curve scores — into 7-scores — they 
occupy the positions in the normal curve shown in Figure 37. 
The unequal scale distances between the scores in Figure 37 
show clearly that, on the assumption of normality in the trait, 
the original scores do not represent equal difficulty steps. 
T-scores are simply с-ѕсогеѕ in a normal distribution multi- 
plied by 10 and referred to an arbitrary reference point below 
the mean in order to avoid negative signs. In the c scaling of 
items, the mean is taken at zero and о is put equal to 1. The 


APPLICATIONS OF NORMAL PROBABILITY CURVE 153 


TABLE 23 


То FACILITATE THE CALCULATION Or T-ScORES 


The percents refer to the percentage of the total frequency below a 
given score + 1/2 of the frequency on that score. Т-5согез are 
read directly from the given percentages. 


Percent T-score Percent T-score 
.0032 10 53.98 51 
.0048 11 57.93 52 
-007 12 61.79 53 
011 13 65.54 54 
.016 14 69.15 55 
.023 15 72.57 56 
.034 16 75.80 57 
.048 17 78.81 58 
.069 18 $1.59 59 
.097 19 . 84.13 60 
13 20 86.43 61 
19 21 88.49 62 
.26 22 90.32 63 
.35 23 91.92 64 
47 24 93.32. 65 
.62 25 94,52: 66- 
82 26 95.54 67 

1.07 27 96.41 68 
1.30 28 97.13 69 
179 29 97.72 70 
2.98 30 98.21 71 
2.87 31 98.61 72 
3.59 32 98.93 73 
4.46 33 99.18 74 
548 34 99.38 75 
6.68 35 99.53 76 
8.08 36 99.65 77 
968 37 99.74 78 

1151 38 99.81_ 79 

13.57 39 99.865 80 

15.87 40 99.905 2 
F 99.931 82 

8.41 41 3 

21.19 42 99.952 83 

24.20 43 99.966 84 
3 99.977 85 

27.43 44 E 

5 99.954 86 

30.85 45 

34.46 46 99.9890 87 

99.9928 88 

38.21 47 5 

42.07 48 99.9952 89 

46.02 49 99.9968 90 


154 STATISTICS IN PSYCHOLOGY AND EDUCATION 


— + 
20 30 40 50 60 70 80 
3 4 567 8 9 0 


Fic. 37. Normalized Distribution of the Scores in Table 22 and Figure 36. 
Original scores and T-score equivalents are shown on baseline. 


point of reference, therefore, is zero and the unit of measure- 
mentis one. Now if the point of reference is moved from the 
mean of the normal curve to a point — 5c below the mean, this 
new reference point becomes zero and the mean becomes five. 
Тһе с divisions above the mean (+ lo, + 2c, + 3c, + 4c, and 
+ 50) become 6, 7, 8, 9, and 10; and the c divisions below the 
mean (- lo, — 20, — 30, — 4c, and - 5а) are 4, 3, 2, 1, and 0. 
Тһе с of the distribution remains, of course, equal to 1, as shown 
in Figure 38. 

Relatively slight changes are needed in order to convert this 
o-scale into а T-scale. The T-scale begins at — 5c and ends at 
+ бо. But c is multiplied by 10 so that the mean is 50 and the 
other с divisions are 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100. 
The relationship of the 7-scale to the ordinary c-scale is shown 
in Figure 38. Мое that the T-scale ranges from 0 to 100; 
that its unit (T) is 1 (i.e., .1 of с which is taken equal to 10), 
and that the mean is 50. The reference point on the T-scale is 
set at — 50 in order to have the scale cover exactly 100 units. 
'This is convenient, but it puts the extremes of the scale far 
beyond the ability ranges of most groups. In actual practice, 
T-scores range from about 15 to 85 (1.е., from — 3.5о to + 3.50). 

In Table 23, percents lying to the left of (below) succeeding 
с-роїпіѕ expressed as Т-зсогез are tabulated, rather than ver- 
cents between the mean and given c-points as in Table 17. 


APPLICATIONS OF NORMAL PROBABILITY CURVE 155 


-5 -4 -3 -2 | 0 Із 5 4 
G - Scole, Zero Point at Mean | 
0 1 2 3 4 5 ol 7 8 9 10 
O- Scale, Zero Point at -57 | | 
9 10 20 30 40 50 60 10 80 90 100 


T - Scale Zero Point at -50 
Fic. 38. To Illustrate o-Scaling and 7-Scaling in a Normal Distribution. 


Table 23 is useful, therefore, in enabling one to read T-scores 
directly, but the reader should note that T-scores can also be 
computed from Table 17. We may illustrate with the score of 8 
(Table 22) which has а percent below plus one-half reaching of 
87.1. A score failed by 87.1% lies (87.1 — 80) or 37.1% to the 
right of the mean. From Table 17 we read that 37.1% of the 
distribution lies between the mean and 1.130. Since the c of 
the T-scale is 10, 1.180 becomes 11 in T-units; and adding 11 
to 50, the mean, we obtain 61 as the required T-score (see 
Fig. 38). 


T-scores are expressed in terms of the same unit and with 


Tespect to the same reference point; and unlike percentiles are 
equal over the scale. T-scaling is superior to the method of 
scaling separate items because the difficulty value of a score is 
more stable than the difficulty value ofa single item. T-seales, 
too, have the advantage that scores ranging from 0 to 100 are 
More readily understood than are g-scores expressed in other 
units, 


156 STATISTICS IN PSYCHOLOGY AND EDUCATION 


(2) A Comparison of T-scores and Standard (Z) Scores 

T-scores are sometimes confused with standard scores, but 
the assumptions underlying the computation of the two sorts 
of measures are quite different. Table 24 repeats the original 
data of Table 22 and shows the T-score equivalents to given 
“raw” scores. Standard scores, denoted by Z, are listed in 
column (4) for comparison with the T-scores. These Z scores 
were calculated in the following way: The mean of the original 
distribution is 5.73 and the о is 1.72. Each score in the dis- 
tribution may be expressed as a o-deviation from the mean. 
'Thus 3-58 = 28 = — 1.590; A758 = — 1.000, and so 
on for the others. These o-scores may be transformed into а 
new distribution with any mean and с we wish. Suppose we 
set the “new” mean at 50 and the “new” c at 10 (as in the 
T-scale). Then the score of 3 is — 1.59 X 10 or — 16 units from 
50 or at 34; and score 4 is — 1.00 X 10 or — 10 units from 50 or 
at 40 (see Table 24). 


TABLE 24 
Comparison or Т-ЗсовЕз AND STANDARD (Z) SCORES 
(Data from Table 22) 


Test : Standard (Z) 8 
Score f T-Seores М = 50. 2 dU. 

10 1 74 5 

9 4 67 69 

8 6 61 63 

7 10 56 57 

6 8 52 52 

5 13 48 46 

4 18 11 40 

3 2 28 34 

М = 62 Equation for converting test 


Scores into standard s 
For test scores: cores (see p. 157) 


Р Х – 5.73 2-50 
М = 5.73 X —573 7-50 
45-17 1.72 10 
10X 573 
a= Te i559" 
Z = 5.82X — 33.3 + 50 


Z = 5.82Х + 167 


APPLICATIONS OF NORMAL PROBABILITY CURVE 157 


The simplest plan for converting raw scores into o-scores is 
"diee 

to set up an equation as shown in Table 24. Here -— = 
Z= 50 

10 
аз c-deviations from 5.73, are equal to Z-scores in the new 
distribution expressed as o-deviations from 50. Z= 5.82X 
+ 16.7, and on substituting our X’s (i.e., 3, 4, 5, ete.) we obtain 
equivalent Z's (i.c., 34, 40, 46, etc.). These Z-scores correspond 
fairly closely to T-scores, and the more “normal” the original 
distribution the closer is the correspondence. The two kinds of 
scores are not interchangeable, however. With respect to the 
original scores, Г-зеогез represent equivalent scores in a normal 
distribution. Standard or Z-scores, on the other hand, have the 
same form of distribution as the original scores, and are simply 
Original scores expressed in c-units. Z-scores represent the 
transformation we make when inches are changed into centi- 
meters, or pounds are changed into kilograms. Both of these 
Operations are “linear transformations,"* and involve no as- 
sumption as to form distribution. 


» that is, X-scores in the original distribution, expressed 


(3) Percentile Scaling . 

In percentile scaling, à child who makes a certain score upon 
& test is given a percentile rank of 27, 36, or 77, say, іп ac- 
cordance with his position in the distribution. When the dis- 
tribution of each of several tests has been drawn up, individual 
Scores may be readily translated into percentile ranks. These 
ranks may then be compared directly, or combined to give a 
final percentile ranking. The method of computing percentiles 
has already been considered (р. 77). Itis only necessary here, 
therefore, to show how percentile rankings may be compared, 


ог combined into a final score. 
Table 25 gives the percentile 


* 


distributions for nine-year-olds 


When th ati onnecting Z with, X is that of a straight line 
(the Mee деш line equation is у = Mt + b), changing X's 
into Z's involves a “linear transformation. 


158 STATISTICS IN PSYCHOLOGY AND EDUCATION 
upon three tests of the Pintner-Paterson series of performance 
tests.* 

TABLE 25 


PERCENTILE DISTRIBUTIONS FOR NINE-YEAR-OLDS ON THREE TESTS 


Method of Combining the Percentile Ranks 
of a Single Individual 


Percentiles S's 

S's Pere. 

Tests 0 10 20 30 40 50 60 70 580 90 100 Score Rank 

H ion. 62 240 297 325 372 407 440 450 499 577 646 445 65 
Picture completion. 60 100 173 198 159 Iai 40 150 499 100 80 126 70 
Seguin Form-Board. 34 24 21 20 18 18 17 16 15 15 13 17 60 
Median. Percentile Rash 222555 PEERARUU NA TE pU) RII OA UHR SES a a 65 


'The subject, а nine-year-old boy, made a score of 445 on the 

completion test which gives him a percentile rank of 65 (mid- 

way between 60 and 70). On the substitution test, a score of 

126 gives him a percentile rank of 70; and on the Seguin form- 

board a score of 17 gives him a percentile rank of 60. The scores 

on tests two and three are in time units (seconds) so that the 
` lowest score numerically represents the highest achievement, 

The median of this subject’s three percentile ranks is 65, which 
indicates that he stands somewhat above the median of children 
of his age in these tests. If this subject had been ten or eleven 
years old, percentile distributions for these ages would, of course, 
have been used. Percentile ranks may be combined directly 
when such derived scores are expressed in comparable units. 
Each test then has equal weight in the final score. 

Percentile scales assume that the difference between ranks of 
10 and 20 is the same as the difference between ranks of 40 and 
50; that is, percentile differences are taken to be equal through- 
out the scale. This assumption holds strictly only when the 
distribution of scores is in the form of a rectangle ratlier than in 
the form of a normal curve. Figure 39 shows graphically the 
difference between the two types of distribution. Тһе figure 
represents a rectangular distribution and a normal distribution 
of the same area plotted over it. The rectangular distribution 


* Pintner, R., and Paterson, D. G., A Scale of Performance Tests (1925), 
pp. 189 and 197. 


APPLICATIONS OF NORMAL PROBABILITY CURVE 159 


Ета. 39. To Illustrate the Position of the Same Five Percentiles in 
Rectangular and Normal Distributions. 


has been divided into five equal parts or quintiles by taking suc- 
Cessive fifths of the area. Along the top of the rectangle, а 
linear seale comprising five equal units is laid off. The width 
of each small rectangle is the same — the distances from 0 to 20, 
from 20 to 40, from 40 to 60, from 60 to 80, and from 80 to 100 
аге all equal. Now let us compare these equal percentile dis- 
tances with the same percentile distances calculated from the 
Normal curve. The first 20% of area, counted off from the ex- 
treme left of the normal curve, covers almost twice the distance 
along the baseline of the curve as is occupied by the first 20% 
of the rectangular distribution. This first 20% also covers 
about four times as much of the baseline as the third 20% 
(1е., that from 40 to 60) in the normal eurve. Тһе baseline 
extent covered by the first 20% in the normal curve has been 
found in the following way: From Table 17 we find that the 
3072, of the area to the left of the mean extends from the mean 
to point — „840. Hence, the first 20% of the normal distribu- 
tion falls between — 3.00 and - 84e. The second 20% lies 


160 STATISTICS IN PSYCHOLOGY AND EDUCATION 


between — .84c and — .256 since point .25¢ lies at а distance 
of 10% from the mean. The third 20% lies between — .250 and 
950. The fourth and fifth 20%’s occupy the same relative posi- 
tions in the upper half of the curve as the second and first 2097s 
occupy in the lower half of the curve. It is clear that the steps 
from 0 to 20 and from 20 to 40 are not equal when measured 
along the baseline of the normal curve. Note that this in- 
equality is relatively greater at the extremes of the distribution 
than it is around the mean. 

Since most distributions of test scores tend to be normal or 
approximately so, equal percentile distances cannot usually be 
taken to represent equal steps in difficulty throughout the per- 
centile scale. Between Qı and Qs, percentile ranks are approxi- 
mately equally spaced. Percentile ranks of a child in two 
different tests may be combined or averaged with little error 
when they fall between these limits. But percentile ranks 
greater than 75 or less than 25 should be combined, if at all, with 
full knowledge of their limitations. 


ІП. Tue TRANSFORMATION OF MEASURES By RELATIVE 
POSITION INTO UNITS or AMOUNT 


1. Product Scales. The Conversion of Judgments of Relative 
Merit into o or PE Units 

We have seen in the last section how test scores may be scaled 
on the principle that the c-value determined from the per- 
centage passing a given item is an acceptable index of difficulty. 
In sealing scores the assumption is made that ability is normally 
distributed from poor to good, and that performance may be 
scored quantitatively in terms of amount or time. It often 
happens, however, that the ability or trait in which we are 
interested is of such a nature that achievement cannot be ex- 
pressed by a test score. This necessitates the construction of 
what are called “product scales.” On such scales excellence of 
performance is evaluated by comparing an individual’s produc- 
tion with various “standard productions” the values of which 
have been determined beforehand by a consensus of experts. 


APPLICATIONS OF NORMAL PROBABILITY CURVE 161 


Handwriting, compositions, and drawing scales are well-known 

examples of product scales. The excellence of a person’s pen- 

manship, for example, can be determined by comparing a 

sample of his writing with various specimens of handwriting, 

the quality of which has been measured against some criterion. 
Product scales are constructed on the principle that “equally 
often noticed differences” in quality are equal. If composition 

A, for example, is rated better than composition B by 75% of a 

group of competent judges, and composition X is rated better 

than composition Y by 75% of the same judges, then the 
difference between A and B is taken to be the same as the 
difference between X and Y (because equally often observed). 

The assumption that “equally often noticed differences are 
equal” has been criticized* and is most doubtful when applied 
to the scaling of items at the extremes of the qualitative range. 

The variability of judgments upon extremely good or extremely 

poor specimens will ordinarily be less than the range of judg- 

ments made upon intermediate specimens. In most product 
scales the accurate measurement of these extreme specimens is, 
perhaps, not so important as is the accurate scaling of those 
items which constitute the main body of the scale. For this 
reason, the assumption that equally often noticed differences 
are equal will usually give scales which are as useful practically 
às those resulting from the use of more refined techniques. ; 

Steps in constructing a product scale may be set down as 
follows: 

(1) Collect a large number of samples of the product to be 
sealed (e.g., handwriting, drawings, jokes, pictures). These 
specimens should range by gradual stages from very poor 
to excellent. 

(2) Persuade a number of competent persons to act as judges of 


the comparative excellence of the specimens. These judges 


are instructed to compare every specimen with every other 


* Thurstone, L. L., “Equally Often Noticed Differences," Journal of 


Educational P. 7 -293. 
; 8 (1927), 289-293. x a 
hurstone, зрелое, aie eof Analysis," American Journal of 


Psychology, 38 (1927), 308-389. 


162 STATISTICS IN PSYCHOLOGY AND EDUCATION 


(3 


(4 


Co 


specimen, so that a consensus may be obtained on each. 
Тһе order of merit method, the paired comparisons method, 
or some variation of these, should ordinarily be employed 
here, as these experimental techniques provide a syste- 
matic attack upon the problem of ranking samples for 
excellence.* 

Reduce the number of times each specimen is ranked above 
each other specimen to percentage terms, and express these 
percents as c-distances between each pair of specimens. 
То illustrate, if drawing A is judged better than drawing В 
by 65% of the group, А — B = .39e; if B is judged better 
than C by 77%, B — C = 740. These o-differences are read 
from Table 17 and are found in the following way: Па 
sample is judged better than another by just 50%, there 
is no observable difference between the two and their 
c-difference is zero. But if А is judged better than B by 
65%, the difference between А. and В (in excess of chance) 
is 15%, which from Table 17 corresponds to a c-difference 
of .39. In exactly the same way the difference between В 
and C (in excess of chance) is 27%, which corresponds to a 
o-difference of .74. Figure 40 shows graphically how per- 
centage differences can be converted into o-differences. The 
distributions of judgments upon A, B, and C are assumed 
to be normal and are taken to be equal in range and varia- 
bility. The mean value of А (its scale value) is .39с above 
the mean value of B, whose mean value is, in turn, .74c 
above the mean value of C. 

Determine a difference for each pair of specimens, and ex- 
press each item finally selected for the scale as so many 
c-units from the arbitrary zero. Тһе procedure may be 
illustrated by two items, numbers eight and nine, taken 
from the Hillegas Composition Scale.[ Hillegas had each 
of 202 judges arrange а number of English compositions in 


— 


= 


* Woodworth, В. S., Experimental Psychology (1938), pp. 372-378. 
1 Hillegas, Milo B., A Scale for the Measurement of ‘Quality in English 
mposition by Young People, Teachers College Record, 18 (1912), 4, 5-55. 


APPLICATIONS OF NORMAL PROBABILITY CURVE 163 
с ВА 


A 
\ 


Fia. 40. То Illustrate o-Scale Differences between Specimens A, B, and C. 
Тһе distributions of judgments on the three specimens are 
taken to be normal, and equal in range and variability. 

order of merit. An artificial composition was selected as 
being of just zero merit, and assigned the value of 0 on the 
scale. Of the 202 judges, 136 or 67.33% ranked specimen 
nine as better than specimen eight. From Table 18, we 
find that a percentage difference of 67.33 indicates a PE 
difference of .65, and this value expresses the amount by 
which nine is better than eight. The value of specimen 
eight had already been found to be 7.72P above the zero 
point on the scale. Hence, specimen nine is 7.72 + .65 or 
8.37PE above the zero composition. The values of the 
nine compositions on the Hillegas Scale as measured in РЕ 
units from the zero composition are 1.83, 2.60, 3.69, 4.74, 
5.85, 6.75, 7.72, 8.37, and 9.37. Note that the steps on the 
scale are fairly regular and are about 1PE apart. 


2. The Transformation of Qualitative Data into Numerical 
Scores 

It is possible to represent many kinds of qualitative data in 

Quantitative terms, if we can assume that measures of the trait 


164 STATISTICS IN PSYCHOLOGY AND EDUCATION 


or ability sampled by our data are normally distributed. Two 
examples, which are typical of many, will be given by way of 
illustration. 


(1) The Scaling of Answers to a Questionnaire 
The answers to the queries or statements in most question- 

naires admit of several possible replies, such as Yes, No, ?; or 
Most, Many, Some, Few, No; or there are four or five an- 
swers one of which is to be checked. It is often desirable to 
"weight" these different alternatives in accordance with the 
degree of divergence from the “typical answer" which they 
indicate. Let us first assume that the attitude or other per- 
sonality trait expressed in answering a given proposition is 
normally distributed. From the percentage who accept each 
alternative answer to a question or statement, we may then 
find а o-equivalent, which will express the value or weight to 
be given that answer. Likert’s* Internationalism Scale fur- 
nishes an example of this scaling technique. This question- 
naire contains twenty-four statements upon each of which the 
subject is requested to give an opinion. Approval or disap- 
proval of any statement is indicated by checking one of five 
possibilities “strongly approve,” “approve,” “undecided,” 
“disapprove,” and “strongly disapprove.” The method of 
scaling as applied to statement No. 16 on the Internationalism 
Seale is shown in Table 26 on page 165. This statement reads 
as follows: 
16. All men who have the opportunity should enlist in the 

Citizens’ Military Training Camps. 

Strongly approve Approve Undecided Disapprove 

Strongly disapprove 

The percentage selecting each of the possible answers is shown 

in the table. Below the percent entries are the o-equivalents 
assigned to each alternative on the assumption that opinion on 
the question is normally distributed — that few will whole- 


* Likert, R., A Technique for the Measurement of Attitudes, Archi f 
Psychology, No. 140 (1932). ыы 


APPLICATIONS OF NORMAL PROBABILITY CURVE 165 


TABLE 26 
DATA гов STATEMENT Хо. 16 or THE INTERNATIONALISM SCALE 
{ 1 x z A : Strongly 

Answers {Боду Approve Undecided Disapprove Disappidy в 
Percent checking 13 43 21 13 10 
Equivalent 
o-values -163 — 43 .43 .99 1.76 
Z-scores 34 46 54 60 68 


heartedly agree or disagree, and many take intermediate views. 
The o-values in Table 26 have been obtained from Table 27 
(р. 167) in the following way: Reading down the first column 


-30 -20 -1c 0 16 20 3c 


Fic. 41. To Illustrate the Scaling of the Five Possible Answers to 
Statement 16 on Likert’s Internationalism Scale. 


headed 0, we find that beginning at the upper extreme of the 
normal distribution, the highest 10% has an average o-distance 
from the mean of 1.76. Said differently, the mean of the 10% 
of cases at the upper extreme of the normal curve is at a distance 
of 1.76c from the mean of the whole distribution. Hence, the 
answer “strongly disapprove” is given a o-equivalent of 1.76 
(see Fig. 41), | 

To find the o-value for the answer “disapprove,’ 


D 


we select 


à 


166 STATISTICS IN PSYCHOLOGY AND 


01234 6 6 7 8 9 10 11 12 13 14 15 
1 270218 196 181 170 160 151 144 137 131 125 120 115 110 106 102 
2 244 207 189 175 165 156 148 141 134 128 122 118 112 108 104 99 
з 228 198 182 170 160 152 144 137 131 125 120 115 110 106 102 97 
4 216 191 177 165 156 148 141 134 128 123 118 113 108 104 100 96 
5 210 185 172 161 152 145 138 131 126 120 115 111 106 102 98 94 
6 199 179 167 157 149 141 135 129 123 118 113 108 104 100 96 92 
т 192174 103 153 145 138 132 126 121 116 111 106 102 98 94 90 
в 186 170 159 150 142 135 128 194 118 113 109 104 100 96 92 88 
9 181 105 155 147 139 133 126 121 116111106 102 98 94 90 86 
10 176 161 151 143 136 130 124 119 114 100 104 100 96 92 88 85 
11 171 158 148 140 134 127 122 116 111 107 102 98 94 90 87 83 
12 167 154 145 138 131 125 119 114 109 105 100 96 92 89 85 81 
13 163 151 142 135 128 122 117 112 107 103 99 94 91 87 83 80 
14 159 147 139 132 126 120 115 110 105 101 97 93 89 85 81 78 
15 156 144 136 129 123 118 113 108 103 99 95 01 87 83 80 76 
16 152 141 134 127 121 116 111 100 101 97 93 80 85 82 78 75 
17 149 130 131 125 119 113 109 104 09 95 91 87 84 80 77 73 
18 146 136 129 122 117 111 106 102 08 93 80 86 82 78 75 72 
19 143 133 126 120 114 109 105 100 96 92 88 84 80 77 73 70 
20 140 131 124 118112 107 103 98 94 90 86°82 70 75 72 69 
21 137 128 121 116 110 105 101 96 92 38 84 81 77 74 70 67 
22 135 126 119 113 108 103 99 95 90 87 83 79 76 72 60 66 
23 132 124 117 111 106101 97 92 80 S5 S1 78 74 71 07 04 
24 130 121 115 109 104 100 05 91 S7 83 80 76 73 69 00 63 
25 127119 113107 102 08 93 80 85 82 78 74 71 08 (4 01 
26 195117 111 105101 96 02 88 84 80 76 73 70 00 03 00 
27 123 115 109 104 90 04 90 86 82 78 75 71 68 05 02 58 
28 120113107102 97 92 88 84 80 77 72 70 67 63 00 57 
29 118 111 105 100 95 91 87 83 79 75 72 08 05 02 59 56 
зо 116109103 08 03 89 85 81 77 74 70 07 (4 60 57 54 
31 114107101 96 02 S7 83 79 76 72 69 65 62 50 50 53 
32 112105 99 94 90 86 82 78 74 71 67 64 61 58 54 51 
33 110103 98 03 88 84 80 76 73 60 00 03 50 56 53 50 
34 108101 96 91 86 82 79 75 71 68 64 61 58 55 52 49 


EDUCATION 


16 
97 
95 


% 


APPLICATIONS OF NORMAL PROBABILITY CURVE 167 


24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 
1 69 00 63 60 57 51 51 45 45 43 40 37 35 32 20 27 
2 07 04 61 58 55 52 50 47 44 41 39 36 33 31 28 
3 66 63 60 57 51 51 48 45 43 40 37 35 32 29 27 
4 04 G1 58 55 52 50 47 44 41 39 30 33 31 28 25 
5 63 00 57 54 51 4S 45 43 40 37 35 32 29 27 24 
6 01 58 55 53 50 47 44 41 39 36 33 31 28 25 23 
т 51 48 45 43 40 37 35 32 29 27 24 21 
8 25 23 20 
9) 21 19 
20 18 
19 16 
18 5 
16 


TABLE 27 


in terms of c, of cath mnp percentage 
of Al distribution. Figures along the top o the table represent 
pem E either extreme. Figures down the side of the 
table represent percentages measured ош DU p^ оо. 

Exa . The a distance from the mean he 1 S 7] 
ofa АКЫП; dinde oud is 1.760 (entry opposite 10 in first column). 

e average distance from the mean of the next 20% is .866 (entry op- 
Dosite 20 in column headed 10). The average distance from the mean of 


the nezt 30% is 


Average distance from the mean, 


26 x .20 + (= .13 х .10) 
.30 


ог .13c (20% lie to right of mean and 10% to left, see p. 165). 


168 STATISTICS IN PSYCHOLOGY AND EDUCATION 


the column headed .10 and running down the column take the 
entry opposite 13, namely, .99. This means that when 10% of 
the distribution reading from the upper extreme have been ac- 
counted for, the average distance from the mean of the next 
13% is 990. Reference to Figure 41 will make this clearer. 
Now from the column headed 23 (13% + 10% “used up" or 
accounted for), we find entry .43 opposite 21. This means that 
when the 23% at the upper end of the distribution have been 
cut off, the mean c-distance from the general mean of the next 
21% is .43c, which becomes the weight of the preference “ unde- 
cided.” The weight of the fourth answer “approve” must be 
found by a slightly different process. Since а total of 44% 
from the upper end of the distribution have now been accounted 
for, 6% of the 43% who marked * approve" will lie to the right 
of the mean, and 37% to the left of the mean, as shown in F'igure 
41. From the column headed 44 in Table 27, we take .08 (entry 
opposite 6%) which is the average distance from the general 
mean of the 6% lying just above the mean. Then from the 
column headed 13 (50% — 37%) we take entry .51 (now — .51) 
opposite 37%, as the mean distance from the general mean 
of the 37% just below the mean. Тһе algebraic sum 


— .51 x 37 + .08 X .06 SS: 
9 "B —— .43, which is the weight assigned 


to the preference “approve.” The 13% left, those marking 
“strongly approve,” occupy the 13% at the extreme (low end) 
of: the curve. Returning to the column headed 0, we find that 
the mean distance from the general mean of the 13% at the ex- 
treme of the distribution is — 1.630. 

In order to avoid negative values, each g-weight in Table 26 
can be expressed as a c-distance from — 3.000 (or — 5.00c). 
If referred to — 3.00c, the weights become in order 1.37, 2.57, 
3.43, 3.99, and 4.76. Dropping decimals, and taking the first 
two digits, we could also assign weights of 14, 26, 34, 40, and 
48. Again each c-value in Table 26 may be expressed as a 
Z-score. In a distribution the mean of which is 50 and the 
с 10, the category "strongly approve" is — 16(— 1.63 x 10) 


APPLICATIONS OF NORMAL PROBABILITY CURVE 169 


from the mean of 50, or at 34. Category "approve" is 
— 4(— .43 X 10) from 50 or at 46. The other three categories 
have Z-scores of 54, 60, and 68. 

й When all of the twenty-four statements on the International- 
ism Scale have been scaled as shown above, a person's “score” 
(his attitude toward internationalism in general) is found by add- 
ing up the weights assigned to the various preferences which he 
has selected. An individual whose opinions are extreme, e.g., who 
tends strongly to disapprove many statements, will receive a 
proportionally larger total score when the choices are o-scaled, 
than he would receive if the five possibilities were assigned 
arbitrary weights ‘of 1, 2, 3, 4, and 5. Likert has shown, how- 
ever, that o-scaling yields results which, for the test as a whole, 
are little if any more reliable or more discriminatory than the 
results obtained when the five answers are scored simply 1, 2, 3, 
4, and 5. This virtual equality of scaling and rule-of-thumb 
method is a rather familiar finding in mental measurement. 
In the present instance, it probably arises from the fact that 
the greater differentiation which the ¢-scaling technique pro- 
Vides for single items is lost in the process of adding or averaging 
the score weights from many items. A real advantage of o- 
Scaling is that the units of the scale are equal and may be com- 
pared from item to item or from scale to scale. Also, o-scaling 
gives a more accurate picture of the extent to which extreme or 
biased opinions on a given question are divergent from the 
typical opinion than does the arbitrary weighting method. 


(2) The Scaling of Judgments or Ratings 

In many psychological problems, individuals are rated or 
tanked for their possession of characteristics or attributes not 
readily measured by tests. Honesty, interest in one’s work, 
tactfulness, originality, are illustrations of such traits. Suppose 
that two teachers A and B have rated a group of forty pupils 
for “social responsibility” on à 5-point scale. .A rating of 1 
Means that the trait is possessed in marked degree, a rating of 
5 that it is almost if not entirely absent, and ratings of 2, 3, 


170 STATISTICS IN PSYCHOLOGY AND EDUCATION 


and 4 indicate intermediate degrees. Assume that the per- 
centage of children assigned each rating is as follows: 


Social Responsibility 


Rating A B 
1 10% 20% 
2 15% 40% 
8 50%, 20% 
4 20% 10% 
5 5% 10% 


It is obvious that В rates more leniently than A, so that а 


rating of 1 by B does not represent the same degree of *social ' 


responsibility" as a rating of 1 by A. Can we assign “weights” 
or numerical scores so as to make the ratings of the two teachers 
comparable? Тһе answer is *yes," provided we сап assume 
that the distribution of the trait "social responsibility" is 
normal, and that one teacher is as competent a judge as the 
other. From Table 27, we may read o-equivalents to the 
percents given each rating by A and B as follows: 


Rating A B 
1 1.76 1.40 
2 195 27 
3 .00 = 458 
4 — 1.07 — 1.04 
5 — 2.10 — 1.76 


These c-values are read from Table 27 in exactly the same way 
ав were the o-equivalents in the previous problem (p. 165). 
If we assume - 3.000 as an arbitrary reference point, the 
c-values for the ratings of A and B all become positive: 


Rating A B 
1 4.76 4.40 
2 3.95 3.27 
3 3.00 2.47 
4 1.93 1.96 
5 .90 1.24 


APPLICATIONS OF NORMAL PROBABILITY CURVE 171 


Dropping decimals, and taking only the first two digits, A's 
апа B's ratings become: 


Rating А В 
1 48 44 
2 40 33 
3 30 25 
4 19 20 
5 9 12 


Or, expressed as Z-scores in a distribution with a mean of 50 and 
ас of 10, 


Rating А В 
1 68 64 
2 60 53 
3 50 45 
4 39 40 
5 29 32 


It is Possible to combine the ratings of A and B by adding or 

У averaging them. If a child receives a rating of “4” by A 

and а rating of “2” by В, his combined or average rating would 
= 107 1 

be — 1.07 + .27 or — 40; 1.93 i 8.27 or 2.60; 2 i = or 26; 


39 2 
+53 
7.3 or 46. 
Table 27 will prove valuable in enabling one to transmute 
Many kinds of qualitative data into quantitative terms or scores. 
Almost any attribute upon which relative judgments can be ob- 
tained may be assigned scores in a normal distribution in terms 


9f the о of the judgments. 


i Changing Order of Merit Ranks into N umerical Scores | 
t is often desirable to transmute orders of merit into units 

of amount or “scores.” This may be done by means of tables, 
И We are justified in assüming normality for the trait in which 
* ranking has been made. To illustrate, suppose that 
fifteen salesmen have been ranked in order of merit for selling 


172 STATISTICS IN PSYCHOLOGY AND EDUCATION 


efficiency, the most efficient salesman being ranked 1, the least 
efficient being ranked 15. If we are justified in assuming that 
“selling efficiency” follows the normal probability curve, we 
can, with the aid of Table 28 (p. 173), assign to each man a 
“selling score" on a scale of 100 points. Such a score will 
probably represent his ability as a salesman better than will a 
rank of 2, 6, or 14. The problem may be stated specifically 
as follows: 
Example (1) Given fifteen salesmen, ranked in order of 
merit by their sales manager, to transmute these rankings 
into scores on a scale of 100 points. 


First, by means of the formula 


Percent position — mom - 3) (21) 


(formula for transmuting ranks into percents) 


in which X is the rank of the individual in the series* and N is 
the number of individuals ranked, determine the “ percent posi- 
tion” of each man. Then from these percent positions read the 
man’s score on a scale of 100 points from Table 28. Salesman A, 
— 5 
who ranks No. 1, has a percent position of 0-5) or 3.33, 
and his score from Table 28 is 85 (finer interpolation unneces- 
sary). Salesman В, who ranks No. 2, has a percent position of 
1009 — 3 5) or 10, and his score, accordingly, is 75. The scores 


of the other salesmen, found in exactly the same way, are given 
in the table on page 174. 

Tt has been frequently pointed out that the assumption of 
normality in а trait implies that differences at the extremes of 
the trait are relatively much greater than differences around the 
mean. This is clearly brought out in the next table; for, while 

„all differences in the order of merit series equal 1, the differences 


‚ * А rank is an interval оп a scale; .5 is subtracted from each В because 
its midpoint best represents an interval. E.g., В = 5 is the 5th interval, 
namely 4-5, and 4.5 (or 5 — .5) is the midpoint. 


AWWA———————————" 


APPLICATIONS OF NORMAL PROBABILITY CURVE 173 


TABLE 28 
Tur TRANSMUTATION оғ ORDERS OF MERIT INTO 
UNITS оғ AMOUNT OR "SCORES" * 


me = .5) or 10 


(formula 21) and from the table, the equivalent rank is 75, on a scale of 
100 points. 


Example: If N = 25, and 7? = 3, Percent Position is 


Percent Score Percent Score Percent Score 
.09 99 22.32 65 83.31 31 
.20 98 23.88 64 84.56 30 
132 97 25.48 63 85.75 29 
45 96 27.15 62 86.89 28 
.61 95 28.86 61 87.96 27 
78 91 30.61 60 88.97 26 
.97 93 32.42 59 89.94 25 

1.18 92 34.25 58 90.83 24 
1.42 91 36.15 57 91.67 23 
1.68 90 35.06 56 92.45 22 
1.96 59 40.01 55 93.19 21 
2.28 83 41.97 54 93.86 20 
2.63 87 43.97 53 94.49 19 
3.01 86 45.97 52 95.08 18 
3.43 85 47.98 51 95.62 17 
3.89 84 50.00 50 96.11 16 
4.38 83 52.02 49 96.57 15 
4.92 82 54.03 48 96.99 14 
5.51 81 56.03 AT 97.37 13 
6.14 80 58.03 46 97.72 12 
6.81 79 59.99 45 98.04 11 
7.55 78 61.94 44 98.32 10 
8.33 77 63.85 43 98.58 9 
9.17 76 65.75 42 98.82 8 
10.06 75 67.48 41 99.03 7 
11.03 74 69.39 40 99.22 6 
1204 73 71.14 39 99.39 5 
13.11 72 72.85 38 99.55 4 
14.25 71 74.52 37 99.68 3 
15.44 70 76.12 36 99.80 2 
16.69 69 71.68 35 99.91 1 
18.01 68 79.17 34 100.00 0 
19.39 67 80.61 33 
20.93 66 81.99 82 


between the transmuted scores vary considerably. The greatest 
differences are found at the ends of the series, the smallest in 
the middle. For example, the difference in score between A and 
B or between N and О is three times the difference between С 


* From Hull, C. L., The Computation of Pearson's г from Ranked Data, 
Journal of Applied Psychology (1922), 6, pp. 385-390. 


174 STATISTICS IN PSYCHOLOGY AND EDUCATION 


and H. Clearly it is three times as hard for a salesman to im- 
prove sufficiently to move from second to first place, as it is 
for him to improve sufficiently to move from eighth to seventh 
place. 

The percentile ranks (PR’s) of our fifteen salesmen are also 
given in the table for comparison with the normal curve 
“scores.” PR’s were calculated by the method given on 
page 80. Note that the steps between PWs are all equal; 


Salesmen Order of Merit co T Score (Scale 100) PR 
A 1 3.33 85 97 
В 2 10.00 75 90 
С 3 16.67 69 83 
D 4 23.33 64 77 
E 5 30.00 60 70 
Е 6 36.67 57 63 
а T 43.33 53 57 
H 8 50.00 50 50 
F 9 56.67 47 43 
J 10 63.33 43 37 
K 11 70.00 40 30 
L 12 76.67 36 23 
M 13 83.33 31 17 
N 14 90.00 25 10 
о 15 96.67 15 3 


there are no differences between the PR’s at intermediate and 
at extreme positions. Both ranks and PR’s assume that the 
distribution of ability is rectangular rather than normal in 
form (р. 159). Equal slices of area correspond directly to equal 
distances along the baseline. 

Another use to which Table 28 may be put is in the combina- 
tion of incomplete order of merit rankings. То illustrate: 


Example (2) Six persons, A, B, C, D, E, and F, are to be 
ranked for honesty by three judges. Judge 1 knows all six well 
enough to rank them; Judge 2 knows only three well enough 
to rank them; and Judge 3 knows four well enough to rank 
them. Can we obtain a fair composite order of merit ranking 
for all six persons by combining these three sets of rankings, 
two of which are incomplete? 


APPLICATIONS OF NORMAL PROBABILITY CURVE 175 


We may tabulate our data as follows: 


Persons 
A B с р Е Е 
Judge 1’s ranking 1 2 3 4 5 6 
Judge 2's ranking 2 1 3 
Judge 3's ranking 2 1 3 4 


It seems fair that A should get more credit for ranking first 
in a list of six, than D for ranking first in a list of three, or C 
for ranking first in a list of four. In the order of merit ratings, 
all three individuals are given the same rank. But when we 
assign scores to each person, in accordance with his position in 
the list, by means of formula 21 and Table 28, A gets 77 for his 
first place, D gets 69 for his, and C gets 73 for his. See table 


below: 


Persons 

A B с D E F 

Judge 178 ranking 1 2 3 4 5 6 
Score 77 63 54 46 37 23 
Judge 2's ranking 2 1 3 
score 50 69 31 
Judge 3's ranking 2 1 3 4 
зсоге 56 73 44 27 
Sum of scores 133 113 127 115 81 81 
Mean 67 57 64 58 41 27 
Order of Merit 1 4 2 3 5 6 


АП of the ratings have been transmuted as shown in example 
(1) above. Separate scores may be combined and averaged to 
give the final order of merit shown in the table. 

By means of formula 21 and Table 28 it is possible to trans- 
mute any set of ranks into scores, if we may assume a normal 
distribution in the trait for which the ranking is made. Тһе 
method is useful in the case of those attributes which are not 
easily measured by ordinary methods, but for which indi- 
viduals may be arranged in order of merit, as, for example, 
athletic ability, personality, beauty, and the like. It is also 
valuable in correlation problems when the only available cri- 
terion* of a given ability or aptitude is a set of ranks. Trans- 


* For definition of a criterion, see Chapter XII, p. 394. 


176 STATISTICS IN PSYCHOLOGY AND EDUCATION 


muted scores may be combined or averaged like other test 
Scores. 

А word of explanation may be added with regard to Table 28. 
This table represents а. normal frequency distribution which has 
been cut off at 2.50. The baseline of the curve is 5a, there- 
fore, and may conveniently be divided into 100 parts, each 
05e long. The first .05c from the upper limit of the curve 
takes in .09 of 1% of the distribution and is scored 99 on a scale 
of 100. The next .05c (.106 from the upper end of the curve) 
takes in .20 of 1% of the entire distribution and is scored 98. 
In each case, the percent position gives the fractional part of the 
normal distribution which lies to the right of (above) the given 
“score.” 


PROBLEMS 


1, In a sample of 1000 cases the mean of a certain test is 14.40 and 
с is 2.50. Assuming normality of distribution 
(a) How many individuals score between 12 and 16? 
(b) How many score above 18? below 8? 
(c) What are the chances that any individual selected at random 

will score above 15? 

2. In a distribution of 100 cases, the median is 29.74 and the Q is 

3.18, Assuming normality 


(a) What percent of the cases lie between 24 and 25? 
(b) What limits include the middle 60%? 
(c) What limits include the lowest 5%? 


3. In a certa.n achievement test, the seventh-grade median is 28.00, 
with a Q of 4.80; and the eighth-grade median is 31.60 with a Q 
of 4.00. What percent of the seventh grade is above the median 
of the eighth grade? What percent of the eighth grade is below 
the median of the seventh grade? 

4, Two years ago a group of twelve-year-olds had a reading ability 
expressed by a mean score of 40.00 and a с of 3.60; and a compo- 
sition ability expressed by a mean of 62.00 and a ø of 9.60. Today 
the group has gained 12 points in reading and 10.8 points in com- 
position. How many times greater is the gain in reading than 
the gain in composition? 


o -—— mmi um St ЭЧ нес, n UP RI уе — .—... 


1 


10. 


11. 


APPLICATIONS OF NORMAL PROBABILITY CURVE 177 


. In Problem 1, Chapter IV, we computed directly from the distribu- 


tion the percent of Group A which exceeds the median of Group B. 
Compare this value with the percentage of overlapping obtained 
on the assumption of normality in Group А. 


. Four problems A, B, C, and D, have been solved by 50%, 60%, 


70%, and 80%, respectively, of a large group. Compare the 
differenee in diffieulty between А and В with the difference in 
difficulty between C and D. 


. In a certain college, ten grades, A+, A, А-; B+, B, B—; C+, 


C, C—; and D, are assigned. If ability in mathematics is dis- 
tributed normally, how many students in a group of 500 freshmen 
should receive each grade? 


. Five problems are passed by 15%, 34%, 50%, 62%, and 80%, re- 


spectively, of a large unselected group. If the zero point of ability 
in this test is taken to be at — Зе, what is the o-value of each 
problem as measured from this point? 


. (a) Locate the deciles in a normal distribution in the following 


way. Beginning at — Зе, count off successive 10%’s of area up 
to + Зо. Tabulate the o-values of the points which mark off 
the limits of each division. For example, the limits of the first 
10% from — 3e are —:3.00с and — 1.280 (see Table 17, 
р. 115.) Label these points in order from — Зе as .10, .20, etc. 
Now compare the distances in terms of o between successive 
ten percent points. Explain why these distances are unequal. 
(b) Divide the baseline of the normal probability curve (take as бо) 
into ten equal parts, and erect a perpendicular at each point of 
division. Compute the percentage of total area comprised by 
each division. Are these percents of area equal? If not, ex- 
plain why. Compare these percents with those found in (a). 


In a large group of competent judges, 88% rank composition A as 
better than composition B; 65% rank B as better than C. If C 
is known to have a PE value of 3.50 as measured from the “zero 
composition," i.e., the composition of just zero merit, what are the 
PE values of B and A as measured from this zero point? 


Twenty-five men on a football squad are ranked by the coach in 
order of merit from 1 to 25 for all-around playing ability. On the 
assumption that general playing ability is normally distributed, 


178 


12. 


13. 


STATISTICS IN PSYCHOLOGY AND EDUCATION 


transmute these ranks into "scores" on a scale of 100 points. 
Compare these scores with the PR’s of the ranks. 


On an Occupational Interest Blank, each occupation is followed 
by five symbols, L! L ? D D!, which denote different degrees of 
“liking” and “disliking.” The answers to one item are distributed 
as follows: 


L! L ? D р! 
8% 20% 38% 24% 10% 
(а) By means of Table 27 convert these percents into c-units. 
(b) Express each c-value as a distance from “zero,” taken at — 30, 
and multiply by 10 throughout. 
(c) Express each c-value as a Z-score in a distribution of mean 
50, с 10. 


Letter grades are assigned three classes by their teachers in Eng- 
lish, history, and mathematics, as follows: 


Mark English History Mathematics 
A 25 11 6 
B 21 24 15 
с 82 20 25 
D 6 8 20 
F m 2 8 
85 65 74 


(a) Express each distribution of grades in percents, and by means 
of Table 27 transform these percents into c-values. 

(b) Change these o-values into 2-digit numbers and into Z-scores 
following the method on page 171. 

(c) Find average grades [from (b)] for the following students: 


Student English History Mathematics 
S. H. A B с 
Е.М. с B А 


D. B. B D F 


APPLICATIONS OF NORMAL PROBABILITY CURVE 179 


14. Caleulate T-scores in the following problem: 


Percent below given score 
Plus One-half 


Da j Reaching тов 
91 2 .995 76 
90 4 -980 71 
89 6 
88 20 
87 24 
86 28 
85 40 
84 36 
83 24 
82 12 
81 _4 
200 
(The first two T-scores have been entered.) 
ANSWERS 
1. (a) 570 
(b) 75; 5 
(c) 41 in 100 
2. (a) 5% 


(b) 33.72 and 25.76 
(c) 21.95 and lowest score in the distribution 
. 31%; 27% 
. Three times as great. 
. 39% as compared with 42%. 
. Difference between А and В is .25e; between C and D, .32о. 
. Grades: A+ А А- B+ B B- C+ C C- D 
Students 
Receiving: 3 14 40 80 113 113 80 40 14 3 
8. In order: 4.04; 3.41; 3.00; 2.69; 2.16. 
9. (a) .00 10 20  .30  .40.50 .60 .70 .80 .90 1.00 
— 3.00 — 128 —.84 —.52 — 35 0 .25 .52 .84 1.28 3.00 
Diffs: 1.72 .44 .32 .27 25 .25 .27 .32 АА 1.72 
(b) Percents of area in order: .68; 2.77; 7.92; 15.92; 22.57; 
22.57; 15.92; 7.92; 2.77; .68. 


столы 


180 STATISTICS IN PSYCHOLOGY AND EDUCATION 


10. B, 4.05PE; А, 5.80PE. 

11. Rank: 1 2 3 4 5 6 7 8 9 10 1 
Score: 89 80 75 71 68 65 63 60 58 56 54 
PR’s: 98 94 90 86 82 78 74 70 66 62 58 
Rank: 14 15 16 17 18 19 20 21 22 23 24 
Score: 48 46 44 42 40 37 35 32 29 25 20 
PR's: 46 42 38 34 30 26 22 18 14 10 6 


кє па 
нов 
t 
о 


12. L! L ? D D! 
(a) — 1.86 — 94 — .08 80 1.76 
(b) 11 21 29 38 48 
(c) 31 41 49 58 68 
13. F D с В А 
(a) English — 2.70 — 1.74 = .65 28 1.18 
History — 2.28 — 1.38 — 53 39 1.49 
Math. — 1.71 = 71 3 .94 1.86 
(b) English History Mathematies 
— 3.000 Z — 3.000 Z — 3.00% Z 
А 42 62 45 65 46 69 
B 32 52 34 54 39 59 
с 24 44 25 45 81 51 
D 13 33 16 36 23 43 
Е з 23 т 27 18 33 


(с) S. H., 36 or 56; Е. M., 36 or 56; D. D., 20 or 40. 
14. T-scores: 
76, 71, 67, 62, 58, 54, 49, 44, 39, 34, 27 


CHAPTER VII 


SAMPLING AND RELIABILITY 


I. Tug MEANING оғ RELIABILITY 


Tur “true” mean or the “true” c of any set of measurements 
(of height, mechanical aptitude, or intelligence, for example) is 
that value found by taking into account the scores made by ай 
of the members of some defined group (called the population). 
It is rarely if ever possible to measure all of the individuals in a 
given population, say all of the ten-year-old boys in New York 
city. Hence we must usually be content to deal with “samples” 
drawn from our population, and owing to slight differences in 
the composition of these samples, means and 078 may be some- 
or somewhat smaller than their corresponding 
True and obtained measures are referred to, 

parameters and sample statistics.* 
ays estimates of their population 
acy of this estimate is a measure of 


what larger 
population values. 
respectively, as population 
Sample statistics are alw 
counterparts; and the accur 


the reliability of the statistic. | 
Although we may not be able to determine the parameters 


(true values) themselves we can compute limits within which the 
true mean or some other statistic may, with a certain degree of 
confidence, be expected to lie. As we shall see later this range, 
which may be large or small, serves as à useful index of the 
reliability or dependability of the calculated statistic. When- 
ever we have calculated a statistic then, we must ask ourselves 
these questions: “How reliable is my answer?" “How well 
does this mean or с represent the true value which I should 

* A statistic is any measure calculated from a sample as, for example, 


the mean or SD. 
181 


182 STATISTICS IN PSYCHOLOGY AND EDUCATION 


get by taking into account the entire population from which 
my sample was drawn?" The purpose of this chapter is to 
present methods which will enable us to answer these questions. 
Тһе reliability of measures of central tendency will be first con- 
sidered; then the reliability of measures of variability and of 
certain other important statistics; and finally the reliability 
of the differences between obtained measures. 


IL Tar RELIABILITY оғ MEASURES оғ CENTRAL TENDENCY 


1. The Reliability of the Mean 
(1) The Standard Error (SE) of the Mean (см) 

What is meant by the reliability of the mean сап best be 
seen by examining the factors upon which the stability of this 
measure depends. Suppose that we wish to know the mean 
ability of college freshmen in the United States as shown by 
their scores upon the American Council Psychological Examina- 
tion. То measure the achievement of college freshmen in 
general would require in strict logic that we test all of the fresh- 
men in the United States. But this is obviously a stupendous 
task, and we must perforce be satisfied with taking the records 
of as large and as representative* a sample of freshmen as we can 
find. This means that we cannot use freshmen from only a 
single institution or from only one section of the country; and 
that we must guard against selecting only those with high, or 
only those with low, scholastic records. The more successful 
we are in getting an “unselected” group, the more representa- 
tive this group will be of all freshmen in the country. Evi- 
dently, therefore, the reliability of a mean depends for one 
thing upon how impartially we have chosen our sample. 

Given an adequate sample, the reliability of a mean can be 
shown to depend mathematicallyt upon Avo characteristics of 
the distribution: (1) the number of cases (N) and (2) the vari- 
ability or spread of the measures. 


* For further discussion of sampling, see pp. 222-227. 
+ Kelley, T. L., Statistical Method (1923), рр. 82-83. 


SAMPLING AND RELIABILITY 183 


(a) It is clear that the number of cases must influence the 
stability of a mean, since the addition of even one extra measure 
to а series will change the mean unless the additional case hap- 
pens to coincide with the mean exaetly. Moreover, the addition 
of one score to a set of ten scores will effect a greater change 
in the obtained mean than the addition of one score to a set 
of 1000 scores, as each case counts for less in the larger group. 
It сап be shown mathematically, as well as experimentally,* 
that the reliability of a sample mean will increase, not in pro- 
portion to the number of measures upon which it is based, but 
in proportion to the square root of the number of measures. 
The mean obtained from twenty-five scores, for example, is not 
twenty-five times, but +/25 or five times, as reliable as a single 
measure. And a mean based upon thirty-six cases is not four 
times as reliable as a mean based upon nine cases, but only 
twice as reliable — since 36 divided by V9 equals 2. 

(b) Reliability of a mean also depends upon the variability 
of the separate measures around the mean. If the o of the dis- 
tribution is large, the separate measures tend to scatter widely, 
and we are unable to say where those cases in the population 
which we have not measured will most probably fall — whether 
they will be close to, or far from, the mean. On the other hand, 
if the o is small, we may be fairly certain that unmeasured 
cases will fall close to the mean. The reliability of an obtained 
mean, therefore, varies with the size of the с; as c increases, 
the reliability decreases. 

To summarize, the reliability of a mean depends first upon our 
having drawn an unbiased sample from the larger group or 
population which we are studying. When this condition has 
been met, and only then, the reliability of a mean is measured 
mathematically by its standard error which is based upon N 
(the number of cases) and the o of the distribution. The 
formula for the standard error of the mean is 

* Yule, G. U., An Introduction to the Theory of Statistics (10th ed., 
1932), p. 257. For results of experiment, see Thorndike, E. L., Empirical 


Studies in the Theory of Measurement, Archives of Psychology, 3 (1907), 
1-13. 


184 STATISTICS ІХ PSYCHOLOGY AND EDUCATION 


SEmean OF бм = Ti (22) 


(the standard error of the arithmetic mean when N is large)* 


This is an important and much-used formula. The standard 
error of the mean measures the extent to which this statistic 
is affected by errors of measurement (p. 398) as well as by 
differences which arise by chance from sample to sample. A 
decrease in с or an increase іп № will cause the standard error 
to become smaller numerically. A decrease in см means that 
the amount by which the obtained mean probably misses the 
mean of the population is just so much less. In short, the re- 
liability of an obtained mean increases as см decreases. 

A problem will illustrate the use and interpretation of 
formula (22). 


Example (1)t In 1883, the Anthropometric Committee of 
the British Association found the mean height of 8585 adult 
males in the British Isles to be 67.46 inches, with a о of 2.57 
inches. How reliable is this mean? How much does it prob- 
ably diverge from the mean which would have been obtained 
had all adult males in the British Isles been measured? 


We cannot answer these questions precisely when the value 
of the true mean is unknown (as here). But we can give a satis- 
factory answer provided we are willing to be in error once in 
100 trials, or five times in 100 trials, or provided we are willing 
to take some other stipulated risk. Statisticians usually state 
the risk of error which they are willing to assume in a given 
investigation and their degree of confidence depends upon the 
“chances” they are willing to take (р. 187). 

We know that our sample mean is 67.46 inches. Hence it is 
certain that 67.46 is one of the possible values that might 
arise through a random sampling of the given population. But 


* Апу М of 30 ог more — 50 to be conservative — may be considered 
*'*Jarge. 

t Yule, б. U., An Introduction to the Theory of Statistics (10th ed. 
1932), pp. 88-89, 112 and 141. 


SAMPLING AND RELIABILITY 185 


other values could also have arisen, and from a knowledge of 
sampling theory we can predict the probable range within which 
all of these possible sample means willlie. If we аге willing to 
take the risk of being wrong five times in 100 trials, we can put 
the lowest mean obtainable from a sample at 67.46 — 1.960. 
and the highest mean obtainable at 67.46 + 1.9603. If we are 
willing to take the risk of being wrong only once in 100 trials, 
we must put the lowest mean obtainable from a sample at 
67.46 — 2.580, and the highest mean obtainable at 67.46 
+ 2.580 м. Тһе reason for these limits (+ 1.960 and + 2.580 y) 
is that sampling fluctuations around the population mean are 
known to follow the normal probability curve when samples are 
random. From Table 17, page 115, we find that 95% of the 
cases in а normal distribution fall between the limits -- 1.960 ar 
(5% lying outside these limits); and 99% of the cases fall be- 
tween the limits + 2.580 (1% lying outside these limits). 
Now applying formula (22) we find the standard error of the 


mean, ом, to be M E or .028 inch. We can be confident, 


therefore, to the extent of risking a wrong answer five times in 
100 trials that the range of sample means lies between 67.46 
+ 1.96 x .028, or 67.46 == .05. The range of sample means from 
lowest to highest is therefore from 67.41 to 67.51. 

The reliability of our mean depends upon the fact that we 
are quite confident that the true mean lies somewhere within this 
relatively narrow range. But our confidence does not amount 
to certainty since the given result depends upon our willingness 
to go wrong five times in 100 trials. If we wish to take a lesser 
risk (are willing to go wrong only once in 100 trials we may 
conclude with greater confidence than before that the true mean 
lies within the range 67.46 + 2.586 or between 67.46 — .07 
and 67.46 + .07. Since the range within which the population 
parameter (true mean) probably falls is quite narrow in either 
case, we conclude that our obtained mean cannot be very far 
“off” from the true value, and that considerable confidence 
may be placed in its adequacy. 


186 


STATISTICS IN PSYCHOLOGY AND EDUCATION 


How the standard error measures the reliability or stability 
of an obtained mean may be more clearly shown perhaps in 
the following way: Suppose that we have calculated the mean 
height of each of 100 groups of men; that each group contains 
8585 subjects; and that the groups or samples are drawn at 
random from the general population. Тһе 100 means obtained 
from these samples will tend to differ slightly from one an- 
other owing to errors of sampling, or sampling fluctuations. 
Hence, not all samples will represent with equal fidelity the 
population from which they have been drawn. It can be 
shown mathematically that the frequency distribution of 
these sample means will fall into a normal distribution around 
the “true” or population mean as their measure of central 
tendency. Even when the samples are not normally distrib- 
uted themselves, the means from such samples will tend to- 
ward a normal distribution. This "sampling distribution” 
of means measures the “errors” of sampling or fluctuations in 
mean values from sample to sample. In this hypothetical 
normal distribution of means we find relatively few large plus 
or minus deviations; and many small plus, small minus, and 
zero deviations. In short, the obtained means will hit very 
near to the true mean, or fairly close to it, more often than 
they will miss it by large amounts. 

The mean of our distribution of 100 means is the best esti- 
mate of the “true” or population mean. And our best estimate 
of the с of this distribution of means is the standard error 
of the mean which we have calculated. In other words, см 
measures the spread of sample means around the true or popu- 
lation mean. 16 is because of this fact that the standard error 
of the mean becomes a measure of the amount by which 
any obtained mean probably diverges from the population mean 

The results of our hypothetical experiment are represented 
graphically in Figure 42, page 187. The 100 sample means 
are represented by a normal frequency distribution around 
the TM (true mean) and ом is put equal to .028. The 
heights of the different ordinates (y’s) represent the frequency 
of the various sample means. That the true mean is the 
most frequently obtained measure is shown by the fact that 
the ordinate at the 7M is the maximum ordinate. The с of 
а normal distribution when measured off in the plus and minus 
directions from the mean includes the middle 68.26% of the 
cases. About 68 of our 100 obtained means, therefore, may 
be expected to miss the T'M by not more than + 1 øm (+ .028 
inch); and about 96 of our obtained means may be expected 
to miss the TM by + 2 см (+ -056 inch). Since our mean of 
67.46 inches is one of these obtained means the probability is 

approximately .96 that 67.46 inches does not miss the true 
mean by more than + .056 inch. 


SAMPLING AND RELIABILITY 187 


307-20 Е 33e 


" ——À. 
—084  —056  —028 TM .028 «056 084 


Fic. 42. Sample Distribution of Means Showing Variability of 
Obtained Means around the True or Population Mean (TM) 
in Terms of слу (.028). 


(2) Тһе РЕ of the Mean (РЕм) 
The reliability of a mean may be determined by PE instead 
of by ом. РЕм is obtained by multiplying сл by .6745 (see 


р. 119). Thus 
67450 


VN 
(the probable error of the arithmetic mean when N is large, 
i.e., greater than 50) 


РЕм = (23) 


(3) Determining Limits of Accuracy 

As we have seen, the reliability of an obtained mean will 
depend upon the likelihood of its having missed the true value 
by a large or small amount. An obvious difficulty in statements 
concerning reliability arises from our inability to say just how 
much the probable deviation of sample from population mean 
should be before it is to be judged “large.” The sampling error 
allowable in а mean depends upon the purpose of the experi- 
ment, the standards of accuracy set up, the units in terms of 
which measurement is made and other factors.* Ап experi- 

* Garrett, Н. E., “Mean Differences and Individual Differences,” 
Human Biology, 15 (1943), 155-170. 


188 STATISTICS IN PSYCHOLOGY AND EDUCATION 


menter can never state categorically that his computed mean is 
— or is not — "reliable." But he can set up definite limits 
within which he may be quite confident of his result. Degree 
of confidence will depend upon the limits imposed. Fisher has 
proposed two accuracy limits, called respectively the .05 and 
the .01 levels (p. 201), and these may be accepted as standard 
for most experimental work.* We know from Table 17 that 
95% of the cases in a normal distribution lie within the limits 
+ 1.960. Hence, the odds are 95:5 or 19:1 that any sample 
mean will lie within these limits. Furthermore, since 99% of the 
cases in a normal distribution lie within the limits + 2.580 v, 
the odds are 99:1 that any sample mean will not differ from 
the population mean by more than = 2.580м. In our height 
problem on page 184, we were able to say with considerable 
confidence (the odds are 19:1) that 67.46 inches does not differ 
from the true mean height by more than + .05 inch. And we 
could say with still greater confidence (the odds are 99:1), that 
67.46 inches does not differ from the true mean by more than 
+ .07 inch. The two limits, .05 and .01, mark off or define con- 
fidence intervals, the .01 level deserving greater respect than 
the .05 level. 


(4) The SE of the Mean in Small Samples 
Modern writers on statistics make a distinction between the 
standard deviation of the population and the standard deviation 
of а sample drawn from this population, often designating the 
population SD by c, and the sample SD by s. 1% can be 
shown mathematicallyf that the sample SD systematically 
underestimates (is smaller than) the population c, and this 
underestimation is more severe when samples are small. To 
correct this tendency toward negative bias, we should compute 
the standard deviation of a small sample by the formula 
ж Тізһег, В. A., The Design of Experiments (1935), рр. 15-16, 38-43. 
Tippett, L. H. C., The Methods of Statistics (1937), pp. 69-71. 
1 Lindquist, Е. F., Statistical Analysis in Educational Research. (1940), 
p.48-50. Foran interesting demonstration that s is our best estimate of 


the population 0, see Goulden, C. H., Methods of Statistical Analysis 
(1939), РР. 33-37. 


SAMPLING AND RELIABILITY 189 


= Ха? x VE 
s= V а-қ» rather than by the usual formula, с = N 


(p. 58). When М is large the correction effected by using 
(N — 1) instead of N is negligible, but when N is small the 
correction may be considerable. 


In the и for the SE of the mean (om = vy р. 181), 


the сіп the numerator is the population and not the sample SD. 
We never actually have the population с; but we can estimate 
it, and our best estimate is в. When N is less than 50 or so, the 
formula for the SE of the mean should read: 


Pr 
where s= (Ү—1) 


If the SD has already been computed by the formula 


p2 . + : А 
с = \ PE, we can make the same correction in ом given in 


(24) by using the formula 


c 4 
ou = VULT) (24а) 


(standard error of the mean in small samples, 
i.e., N less than 50) 


No matter what the size of М, formulas (24) and (24а) give 
‘the best estimate of the standard error of the mean, i.e., of the 
SD of the sampling distribution of means (Fig. 42, p. 187). In 
very large samples the correction effected by using (24) or (24a) 
is so small that formula (22) may be safely employed. But 
when N is less than 50 it is advisable to use the more exact for- 


mulas, and it is imperative when № is quite small — less than 


10, say. 
In small samples, the normal eurve no longer tells us ac- 


curately the probability of a divergence of our sample mean 


190 STATISTICS IN PSYCHOLOGY AND EDUCATION 


TABLE 29 


А TABLE or ¢ 
For Uses ім DETERMINING THE RELIABILITY ОЕ STATISTICS. 
Ir М 15 LamoE, Tass 17 AND 18 May Be Usep. 


Example: Ап (N — 1) = 35 and ¢ = 2.03 means that 5 times 
in 100 trials а divergence as large as that obtained may be ex- 
pected in the positive and negative directions. 


Degrees of PROBABILITY (P) 
Freedom 

(N — 1) 0.50 0.10 0.05 0.02 0.01 
1 1 = 1.000 ¢=634 ¢=12.71 t= 31.82 t= 63.66 
2 0.816 2.92 4.30 6.96 9.92 
3 .765 2.35 3.18 4.54 5.84 
4 741 2.13 2.78 3.75 4.60 
5 727 2.02 2.57 3.36 4.03 
6 -718 1.94 2.45 3.14 3.71 
7 711 1.90 2.36 3.00 3.50 
8 -706 1.86 2.31 2.90 3.36 
9 -703 1.83 2.26 2.82 3.25 
10 700 1.81 2.23 2.76 3.17 
1 .697 1.80 2.20 2.72 3.11 
12 .695 1.78 2.18 2.68 3.06 
13 .694 1.77 2.16 2.65 3.01 
14 .692 1.76 2.14 2.62 2.98 
15 .691 1.75 2.13 2.60 2.95 
16 .690 1.75 2.12 2.58 2.92 
17 .689 1.74 2.11 2.57 2.90 
18 .688 1.73 2.10 2.55 2.88 
19 .688 1.73 2.09 2.54 2.86 
20 .687 1.72 2.09 2.53 2.84 
21 .686 1.72 2.08 2.52 2.83 
22 .686 1.72 2.07 2.51 2.82 
23 .685 1.71 2.07 2.50 2.81 
24 .685 1.71 2.06 2.49 2.80 
25 .684 1.71 2.06 2.48 2.79 
26 .681 1.71 2.06 248 2.78 
27 .684 1.70 2.05 2.47 2.77 
28 .683 1.70 2.05 247 2.76 
29 .683 1.70 2.04 2.46 2.76 
30 .683 1.70 2.04 2.46 2.75 
35 .682 1.69 2.03 2.44 2.72 
40 .681 1.68 2.02 2.42 2.71 
45 .680 1.68 2.02 241 2.69 
50 .679 1.68 2.01 2.40 2.68 
60 .678 1.67 2.00 2.39 2.66 
70 .678 1.67 2.00 2.38 2.65 
80 .677 1.66 1.99 2.38 2.64 
90 .677 1.66 1.99 2.37 2.63 


SAMPLING AND RELIABILITY 191 


Degrees of PROBABILITY (P) 
Freedom 

(М — 1) 0.50 0.10 0.05 0.02 0.01 
100 .677 1.66 1.98 2.36 2.63 
125 .676 1.66 1.98 2.36 2.62 
150 .676 1.66 1.98 2.35 2.61 
200 .675 1.65 1.97 2.35 2.60 
300 .675 1.65 1.07 234 2.59 
400 .675 1.65 1.07 2.34 2.59 
500 .674 1.65 1.96 2.33 2.59 
1000 .674 1.65 1.96 2.33 2.58 
co 674 1.65 1.96 2.33 2.58 


from the population mean. Тһе sampling distribution to be used 
when М is small is not strictly normal; its "shoulders" are 
higher than in the normal сигуе and the probability of extreme 
deviations somewhat greater. Selected values for this sampling 
distribution, called "Student's" distribution,* are given in 
Table 29. For N’s differing in size, this table gives the + # 
distances beyond which (i.e., to left and right) certain per- 
centages of “Student’s” distribution lie (.50, .10, .05, .02, .01). 
We may illustrate the use of Table 29 in small samples with a 
problem. 


Example (2) Ten measures of reaction time to a light 
stimulus are taken from one practiced observer. The mean 
is 175.50ms, and the с is 5.82ms. Determine the .05 and the 
.01 limits of accuracy for this mean. 


5.82 д 
From formula (24а) см = Ju = 2 — 1.94ms. Ten 


observations have 9 “degrees of freedom,” } and from Table 29, 
for 9 ог (N — 1) degrees of freedom, we read that ¢ = 2.26 (at the 
.05 level), and £ = 3.25 (at the .01 level). The quantity t is 
distance from the mean expressed in terms of the standard error of 


ж Fisher, В. A., Statistical Methods for Research Workers (Sth ed., 1941), 
pp. 116-117. Е 

t When the sum (or mean) of 10 measures is known, only 9 may be 
selected “freely,” as the sum (or M) fixes the 10th. Accordingly, there 
are 9 degrees of freedom for 10 measures and in general (N-1) degrees 
of freedom for N measures. See also page 257. 


192 STATISTICS IN PSYCHOLOGY AND EDUCATION 


the mean (ї.е., t= = т/ам). From the first ¢ we know that 
[when (N — D = 9] 95% of the sampling distribution fall be- 
tween the mean and + 2.260 and 5% fall outside of these 
limits. From the second ¢ we know that 99% of the sampling 
distribution fall between the mean and + 3.250, and 1% falls 
outside these limits. The probability is .95, therefore, that our 
sample mean of 175.50ms does not diverge from the population 
mean by more than + 4.38ms (+ 2.26 X 1.94); and the proba- 
bility is .05 that its divergence is greater than + 4.38ms. At 
the .01 level, the probability is .99 that our mean of 175.50ms 
does not diverge from the population mean by more than 
=+ 6.31108 ( 3.25 X 1.94); and the probability is .01 that its 
divergence is greater than + 6.31ms. 

Several points in the solution of this problem deserve com- 
ment as they illustrate clearly the difference between the treat- 
ment of large and small samples. In the first place, had we 
used formula (22) instead of the correct formula (24а), the SE of 
the mean would have been 1.75 instead of 1.94; i.e., 10% too 
small. Again, the .05 and .01 accuracy limits in the normal 
curve are, аз we have seen, + 1.960c,; and + 2.5804, respec- 
tively. These limits are 15% and 20% smaller than the cor- 
responding ¢ limits + 2.26 and + 3.25 got from Table 29 when 
(N —1)is9. It is clear, therefore, that when N is small, use of 
formula (22) will eause а caleulated mean to appear more re- 
liable than it actually is. 

'The reader should note that if formula (24a) and Table 29 
are used in determining the reliability of the mean in our height 
problem (p. 184), results will not differ to the second decimal 
from those got with formula (22) and Table 17. This is because 
of the very large sample (8585) there used. As N increases, 
entries in Table 29 approach more and more closely the corre- 
sponding normal curve entries in Table 17. In the normal curve, 
for instance (Table 17), 10% of the distribution lie beyond the 
limits + 1.650,, 5% beyond the limits + 1.9601, and 1% be- 
yond + 2.5804. In Table 29 the corresponding ¢ limits for 
(N — 1) = 50, are + 1.68, + 2.01, + 2.68; for (N — 1) = 100, 


SAMPLING AND RELIABILITY 193 


the limits are + 1.66, + 1.98, + 2.03. When N is very large 
(see last entries in Table 29) the points beyond which specified 
percents of the distribution lie are virtually the same in Table 29 
as in Table 17, and "Student's" distribution becomes a normal 
probability curve. Table 29 may be generally used, then, with 
large as well as with small samples. 


2. The Reliability of the Median 

The standard error and the probable error of the median 
may be computed directly from formulas for determining the 
reliability of the mean. The Gatan and PE xan are 1.2588 
(roughly 5/4) times the см and the PE, respectively. Thus 


1.25330 

= = 25 
O Mdn VN (25) 
(standard error of the median when N is large) - 


1.2583 x .67450 84540 
PE Man = жа”, ЖШ = т (26) 


ж 
9n PE man = IN (262) 


(probable error of the т 


When samples are small (less than 50, say); (N — 1) should 
replace N in the denominators of these formulas, and Table 29 
should be used in setting up accuracy limits at different levels 


of confidence. 
An example will illustrate the use of formula (26а). 


Example (8) On the Trabue Language Scale A,t 801 twelve- 
year-old boys made the following record: Median — 214; 
0-49. How reliable is this median? How well does it 
represent the median of twelve-year-old boys in general on 


the given scale? 


* The quartile deviation calcula’ 
ы Trabue M. R., Completion Test Language Scales, Teachers College, 
Columbia University Contributions to Education, 77 (1916), 15. 


nedian when N is large) 


ted from & frequency distribution is 


194 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Since N is quite large, we may use formula (26a) to find 
PE man equal to 22. Ina normal distribution, the middle 95% 
of cases lie between the mean and + 2.9PE, and the middle 
99% between the median and + 3.8PE (see Table 18). We 
may say with considerable confidence, therefore (odds, 19:1), 
that 21.4 does not diverge from its true value by more than 
=+ .64(+ 2.9 X .22); and with much greater confidence (odds, 
99:1) that 21.4 does not miss the population median by more 
than + .84(- 3.8 X .22). 


ІШ. Tae RELIABILITY OF MEASURES OF VARIABILITY 


1. The Reliability of the Standard Deviation, ог с 

As was true of the mean and median, the reliability of an 
obtained standard deviation is determined by calculating the 
probable discrepancy between the sample с and the true с. A 
true o is the standard deviation of the population from which 
the sample was drawn. The formula for calculating the re- 
liability of an obtained c is: 


SE, or 05 = NA (27) 
(standard error of а standard deviation when N is large) 
When N is less than about 50, formula (27) should read: 
в = уў С Е (27а) 
V2(N —1)  v2N i 
(standard error of а standard deviation when N is small) 


On page 184 we found that for 8585 British males, the 
standard deviation around the mean of 67.46 inches was 2.57 
inches. Since the sample is large, we may use formula (27) to 
Вай 
М2 х 8585 
bility is .95) that the population c is not larger than 2.61 nor 
smaller than 2.53 inches (2.57 -- 1.96 X .02). And we may feel 
very confident (probability 90) that the population ø is not 
greater than 2.62 nor less than 2.52 (2.57 = 2.58 х .02). These 


= .02 inch. We may be confident (proba- 


SAMPLING AND RELIABILITY 195 


relations are shown in Figure 43 in which the true 6 is repre- 
sented as the mean of a sampling distribution of 075, i.e., the 
distribution of 078 computed from successive samples. Тһе 
standard deviation of this normal distribution is .02, the stand- 
ard error of б. 

In the problem on page 191 the caleulated SD was 5.82ms, 
around the mean of 175.50ms. Since № is only 10, we use 


5.82 
9.92 = |37ms. From Table 


0 


oo 


formula (27a) to get су = VA 


Fia. 43. 


— 1) =9 the accuracy limits 
3.25, respectively. We may 
bability is .95), that the 


29 we find, as before, that for (№ 
at .05 ава .01 are = 2.26 and = 


feel confident, therefore (the pro 
Population с is not larger than 8.92ms nor smaller than 2.72ms 


(5.82 + 2.96 1.37). And we may feel very confident (the 


Probability is .99) that the population с is not larger than 
10.27ms nor smaller than 1.37ms (5.82 + 3.25 х 1.37). Note 
that if formula (27) had been used, the standard error of our с 


Would have been 522 ог 1.30025 instead of 1.37ms. Thus had 
20 


We used large instead of small sample methods, the limits of 


196 STATISTICS IN PSYCHOLOGY AND EDUCATION 


accuracy at the .05 level would have been 8.37ms and 3.27ms 
(5.82 + 1.96 X 1.30) instead of 8.92ms and 2.72ms, the correct 
values. 

The reader should note two facts: (1) the relatively wide 
range of likely values for с (high unreliability) when М is small; 
and (2) the greater apparent reliability of the standard devi- 
ation when large sample methods are incorrectly used with 
small samples. Because of (2), it is wise to use formula (27а) 
and Table 29 even when N is fairly large. 


2. The Reliability of the Quartile Deviation or Q 
Тһе reliability of the Q of a distribution may be found from 
the formula 
aoe (28) 


(standard error of Q in terms of the с of the distribution)* 
or from the formula 
с, = == (28а) 


(standard. error of © їп terms of the Q of the distribution)* 


On page 193, the median score of the 801 twelve-year-old 
boys who took the Trabue Completion Test, Scale A, was 21.4 
with a Q of 4.9. Since N is large, we may use formula (28а) to 
find бо equal to .20. Adopting the .05 level, we may be con- 
fident that the population Q lies between 5.3 and 4.5 (4.9 
+ 1.96 х .20). Stated differently, there is only 1 chance in 20 
that the sample Q of 4.9 differs from the population Q by more 
than + .40 (i.e., + 1.96 x .20). 

* When N is less than 50, formulas (28) and (28a) should be written 


i 1,116 1.118 1.650 " 
во = VAN — 1) or Van во = Va СО and Table 29 used in 


tests of significance. 


SAMPLING AND RELIABILITY 197 


IV. Tur RELIABILITY OF THE DIFFERENCE BETWEEN 
Two MEASURES 


1. The Reliability of the Difference between Two Means 

Suppose we wish to discover whether there is any difference 
between fifth-grade boys and fifth-grade girls in their knowledge 
of words. Тһе usual method of attacking this problem is to 
select a large and random sample of fifth-grade boys and girls; 
administer a vocabulary test; compute the means; and find 
the difference between the two means. If this difference is five 
points, let us say, in favor of the girls, this result — on the face 
of it — may be taken as evidence that the typical fifth-grade 
girl knows more words than the typical fifth-grade boy. We 
cannot be certain of this conclusion, however, if all we have is 
the obtained difference of five points, as it is quite possible that 
the difference between the means of other samples of boys and 
girls (comparable to our own groups) might turn out to be zero 
or might even be reversed in favor of the boys. 

When can we feel reasonably sure that а difference is “real” 
and not accidental? The answer to this question can never be 
absolute; it always involves a statement of probability and is 
usually expressed in terms of the accuracy limits discussed in 
the last section. A difference is said to be significant (i.e., re- 
liable or dependable) when the evidence is strong that the 
result found cannot be attributed solely to accidents of sampling. 
By the same token, a difference is non-significant when we are 
confident that № might easily have arisen from sampling 
fluctuations — and hence implies no "real" difference. — 

Clearly it is important that we have some way of estimating 
the significance of an obtained difference; that is, some way of 
telling whether two groups are sufficiently different to enable 
Us to say with confidence that no matter how often other 
Similarly selected samples are compared, some difference will 
Persist. Furthermore, and equally important, if the obtained 
difference is not significant, we want to know, if possible, how 


Near it approaches to significance. 


198 STATISTICS IN PSYCHOLOGY AND EDUCATION 


(1) The Standard Error of the Difference When Means Are 
Uncorrelated (ср) 
Тһе formula for calculating the significance of the difference 
between two sample means when we are dealing with inde- 
pendent or uncorrelated measures is 


ср OY CMM, = М ам, + 0?м, (29) 


(standard. error of the difference between two 
uncorrelated means, N’s large)* 


in which oan is the standard error of the mean of the first group; 
см» is the standard error of the mean of the second group; and 
gp is the standard error of the difference between the two 
means. Means are uncorrelated when calculated from different 
groups, or from uncorrelated tests administered to the same 
group. From formula (29) it is clear that, to find the reliability 
of the difference between two means, we must first know the 
reliabilities of the means themselves. 

Тһе application and interpretation of formula (29) may be 
illustrated by the following example: 

Example (1) та study of the intelligence of the foreign- 
born white draft during World War I, a sample of 611 native- 
born Norwegians and a sample of 129 native-born Belgians 
were found to test as follows upon the "combined scale” t: 


Country of Т 

ond е Number of Cases Mean Score c 
Norway 611 12.98 2.47 
Belgium 129 12.79 2.42 


The difference between the two obtained means is .19 (12.98 
— 12.79) in favor of the Norwegians. Is this difference signif- 
icant? That is to say, would further testing of similar samples 


„7 When the PE's of the means have been computed, the PE of the 
difference between two means is 


PEp or РЕм,-м.- VPE, + РЕ?м, (30) 


t The “combined scale” included the eight Alpha tests, the Stanford- 
Binet, and tests 4, 5, 6, and 7 from Beta. The maximum score was 25. 
For the data given in this problem, see Brigham, C. C., A Study of American 
Intelligence (1923), pp. 120-121. 


SAMPLING AND RELIABILITY 199 


of Norwegians and Belgians give virtually the same result; or 
is i& probable that the mean difference would be reduced to 
zero, or even reversed in favor of the Belgians? To answer these 
questions we must first compute the standard errors of the means 
of Norwegians and Belgians, and from these data find the reli- 
ability of the difference between the means. By formula (22), 
the standard errors of the two means are 


Norwegians: ом, = D = .0999 
Р _ 242 _ 
Belgians: Ou; = 7755 = .2130 


Substituting these standard errors in formula (29), we have 


ор = V(.0999)? + (.2130)* = .24 (to two decimals) 

The actual difference between the means of Norwegians and 
Belgians, then, is .19, and the SE of this difference (ор) is .24. 
Let us assume that the difference between the population means 
and Belgians is zero, and that except for acci- 
dental errors mean differences from sample to sample would all 
be zero. In making this assumption we are setting up the 
"null hypothesis," a proposition somewhat akin to the legal 
principle that a man is innocent until he is proved guilty (p. 232). 
In our problem, for example, we inquire whether — in view of 
its SE — the mean difference of .19 is really large enough to 
cast grave doubt upon (i.e. disprove) the null hypothesis. 7 

As a first step in testing our hypothesis, we compute a “criti- 
cal ratio” or CR, found by dividing the obtained difference by 
its SE (D/ap = CR). Inthe present problem, the CR = .19/.24, 
or .79. The sampling distribution of differences among sample 
Means is known to be normal when N is reasonably large. Hence 
We may set up a normal distribution like that shown in Figure 44 
in which the mean is set at zero (“true difference") and the с 
of the distribution of differences is бр, OF .24.* The CR tells us 
that our obtained difference of .19 falls at a point .797p from 

op is the best estimate we have of the SD of the sampling distribution 
of differences (p. 189). 


of Norwegians 


200 STATISTICS IN PSYCHOLOGY AND EDUCATION 


the hypothetical mean of zero; and a difference of — .19 will, 
of course, fall at — .796p. The value — .19 is obtained when 
the mean of the Belgians is higher than the mean of the Nor- 
wegians by .19. 

Now from Table 17 we know that 29% x 2, or 58% of the 
cases in а normal distribution fall between the mean and 
+ .79¢p; and 42% of the cases fall outside of these limits. 
Even when the true difference is zero, then, we can expect 
differences larger than + .19 to occur by chance forty-two times 
in 100 comparisons of Norwegians and Belgians. A difference 
of + .19, therefore, might easily arise from sampling errors and 
is clearly not significant. Accordingly, we retain the null hy- 
pothesis and conclude with confidence that — on present evi- 
dence — there is no real difference between Norwegians and 
Belgians on the combined scale. When the null hypothesis is not 
disproved (as here) the result is often stated as follows: there 
is good reason to believe that our two samples were drawn from 
the same parent population and differ only by sampling errors. 

So far we have dealt with the probability that, on the null 
hypothesis, the Norwegians are better than the Belgians by .19, 


“True 
difference” 


7924 CR= %,=1Y24=.19 % 
а. 44. 


SAMPLING AND RELIABILITY 201 


and the probability that the Belgians are better than the Nor- 
Wwegians by .19 (— .19). In many, perhaps most, experiments, 
however, we are mainly concerned with the direction of the 
difference, 1.е., with the probability that the obtained difference 
or a larger one might have arisen on the null hypothesis. In 
studying the effects of practice and other experimental factors, 
for instance, usually we want to know the probability that one 
group (the experimental, say) is “really” better than the other 
&roup (the control); or we inquire the probability that boys 
&re better than girls in mechanical aptitude or in some other 
ability. In such cases we deal only with the positive end of the 
Sampling distribution of differences. То illustrate, we have 
found in Example (1) that the Norwegians are .19 point higher 
than the Belgians. What is the probability that on the average 
the Norwegians will always score higher than the Belgians by 
19 or тоге? From Figure 44 we know that a difference of .19 
Will be exceeded by chance 21% of the time. Even when the 
true difference is zero, then, we could expeet to find the Nor- 
Wegians better than the Belgians by more than .19 point in 1/5 
Of our comparisons. The difference of .19 (or more) might 
readily be ascribed to chance, therefore (its probability 
P= .21), and there is no reason for believing the N orwegians 
to be better in general than the Belgians on the combined scale. 
(2) Interpretation of Differences in Terms of Significance Levels 

(a) The .05 level of significance | 

Ап investigator often sets up some arbitrary standard of 
Significance on the basis of which he rejects e retains the null 
hypothesis. From Table 17 or 29 (last line in Table) we find 
that 1.96 marks the point in the normal distribution to the left 
and right of which lie 5% of the cases (2576 at each end). 
lf a CR is 1.96 and the N is large, therefore, we reject the null 
hypothesis with some confidence on the grounds that the 
given difference can hardly be attributed to sampling errors. 

The CR of .79 in the problem of Norwegians and Belgians 
falls far below the .05 level of significance, for which a CR 
‘of 1.96 is necessary. All we need say in this problem, therefore, 


202 STATISTICS IN PSYCHOLOGY AND EDUCATION 


is that we retain the null hypothesis with confidence since on 
the evidence there is no reason to suspect a true mean difference 
between Norwegians and Belgians. 

Significance levels may also be used when we are interested in 
the probability that one group is better than the other. From 
ТаЫе 29 we know that 10% (P) of the cases in a normal dis- 
tribution lie to the left and right of t = 1.65; hence, 5% (P/2) 
lie to the right of 1.65. If a CR is 1.65, therefore, we can say 
with confidence that (on the assumption of a true difference of 
zero) only once in twenty trials would a larger positive difference 
than that obtained appear by chance. 

From Figure 44 we have found that twenty-one times in 100 
trials a. difference between Norwegians and Belgians of more 
than + .19 might be expected on the null hypothesis. Because 
of the large chance expectation of a positive difference of .19 
or more, we can feel sure that the Norwegians are not superior 
to the Belgians on the combined scale. 

A second example may serve to clarify certain points dis- 
cussed above. Suppose that the difference between the means 
of an experimental Group А and a control Group B upon Test Х 
is six points, that бр is 3, and N’s are quite large. Since the CR 
of 6/3 or 2 is slightly greater than 1.96, this result may be con- 
sidered significant at the .05 level. We reject the null hypothesis 
with confidence, therefore, since it is quite unlikely (odds 19:1) 
that a critical ratio of 2 (absolute difference of + 6) would 
occur if the difference between the population means of A and 
B were in fact zero. We could expect a difference of more than 
6 (positive direction) to appear in favor of the experimental 
group not more than two or three times in 100 trials. Hence, 
we are justified in asserting that Group A is, in general, superior 
to Group B in Test Х. Figure 45 shows graphically the relations 
represented in this problem. 

Still another way of interpreting the significance of a difference 
is in terms of the “accuracy limits" discussed on page 187. In 
the problem of Norwegians and Belgians, for instance, we ob- 
tained a difference of .19 with a ср of .24. We may be confident, 


SAMPLING AND RELIABILITY 203 


2%% 1 7 —19626, HLI | “26% 

d ыры унь 

=30 —2c =i .00 Іс 2c 3c 
"True 


difference" 
%=3 CR=03=2% 

Fic. 45. 
therefore (odds 19:1), that the difference between Norwegians 
and Belgians lies within the limits — .28 and + .66 (.19 = 1.96 
X.24). Since the lower limit of this range is negative it is quite 
clear (as found before) that the difference between these groups 
could well be zero. In the second problem, the difference be- 
tween control and experimental groups was six points with a 
бр of 3. This difference is twice its standard error and hence is 
Significant (p. 208). We not only assert a significant difference, 
therefore, but put its value with considerable confidence as lying 
between 0 and 12 points (6 = 1.96 x 3). 


(b) The .01 level of significance 

While the .05 level is sufficiently exacting for most inves- 
tigations, the .01 level is demanded by many research 
workers, From Table 17 or Table 29 (last line) we read that 
+ 2.58 mark the points in the normal curve to the left and right 
of which lies 1% of the eases. Ifa CR is 2.58 or more, therefore, 
and N’s are large, we reject the null hypothesis with great con- 
fidence as only once in 100 trials would a larger difference arise 
from sampling errors, when the true difference is zero. If the 
critical ratio is 2.33 (P = .02 and P/2 = :01), we may be very 


204 STATISTICS IN PSYCHOLOGY AND EDUCATION 


confident (odds 99:1) that the group now ahead is really 
superior to the second group in mean attainment. 


(3) The Reliability of the Difference between Means in Small 
Independent Samples 
When the N’s of two independent samples are small (less than 
50, say), the SE's of the means should be caleulated by formula 
(24а) or some variation of it. Table 29 may be used con- 
veniently in testing the significance of the critical ratio or t. 
An example will demonstrate the method to be employed. 


Example (2) A test of mechanical aptitude is administered 
to six boys in Class 1, and to ten boys in Class 2 of a given 
vocational school. Is Class 1 significantly better than Class 2? 
Data are as follows: 


Class 1 Class 2 
Scores x а? Scores т а? 
28 =2 4 30 2 4 
35 5 25 26 -2 4 
32 2 4 25 = 9 
24 | —6 36 34 6 36 
90-4 16 20 -8 64 
35 5 25 28 оо 
6180 110 31 3 9 
30(M1) 24 -4 16 
(1-125 32 4 16 

_ = _30 2 
(№ -1 = тд 10280 2 

28(М.) 


162 + 110 
SD or s = и 441 by (31) 


SE, = "ТКШ = 2.28 by (32) 


For (№, — 1) + (№ — 1) or 14 degrees of freedom, the .10 
level for t is 1.76 (Table 29). 


(7 


SAMPLING AND RELIABILITY 205 


Тһе mean of the six boys in Class 1 is 30, the mean of the 
ten boys in Class 2 is 28 and the mean difference of 2 is 
to be tested for significance. When two samples are small (as 
here) we get a better estimate of the population SD by pooling 
the sums of squares from the two groups and computing one 
SD. Тһе justification for this pooling procedure is that on the 
null hypothesis the real difference between the two classes is 
zero; hence the two samples may be treated as though they were 
drawn from the same population.* Moreover, increasing the № 
gives а more stable SD based on all of the observations. The 
sum of the squares in Class 1 around the mean of 30 is 110; 
and the sum of the squares in Class 2 around the mean of 28 
is 162. The degrees of freedom in Class 1 are (Ni — 1) or 5, 
and the degrees of freedom in Class 2 аге (№ — 1) or 9. By 


formula (31), s = 4/2101 ог 4.41, and this SD serves as 


the standard deviation for each of our groups. The SE of M; is 


4.41 . 441 
LES b p М» 15 —. 
Мб and the SE of Ms is Ab 


[(441)? , (4.41)? 16 
(29) we have SEp = ( 6 ) T ( d = „ү = 2.28. 


Formula (32), on page 206, combines the two SEx’s directly. 
The CR or £ is 2 or 2/228 = 88. The dft in the two 
D 


Combining these by formula 


groups (viz. 5 and 9) are combined to give 14df to be used in 
evaluating the mean difference. From Table 29 for 14 degrees 
of freedom we find the entry 1.76 at the .10 level. The critical 
ratio of .88 falls far below 1.76. Hence the difference of + 2 is 
not significant at the .05 level and there is no reason to believe 
Class 1 superior to Class 2. It must be remembered that at 
-10, 5% of our t’s (CR’s) lie to the right of + 1.76 and 5% lie to 
the left of — 1.76. The limit at .10 (not at .05) must be taken, 
therefore, to give the .05 significance level, if we are interested 
(as here) in knowing the probability that the given difference or a 

* We assume the null hypothesis to hold until it is disproved. 

1 df = degrees of freedom. 


206 STATISTICS IN PSYCHOLOGY AND EDUCATION 


grealer positive one might arise from sampling errors. Figure 
45а illustrates this point. 

The formulas used in testing the significance of a mean differ- 
ence in small independent samples may be written as follows: 
2(X1 — My? + E(X: — М.) (31) 

(Ni — 1) + (W2— 1) 
(standard deviation when two small independent 
samples are pooled) 


7 > p VER 32 
SEp or $р = 8 ММ, ( ) 


(standard error of the difference between means in small 
independent samples) 
In formula (31), E(X, — M)? or Хх? is the sum of the squared 
deviations around the mean of sample 1; and Х(Х,- M3)? or 


SD or s 


=176 00 +176 

For 14 degrees of freedom 5% of the distribution lie to the 
left and 5% to the right of 1.764. (Table 29.) 

Fr. 45а. 


Ут? is the sum of the Squared deviations around the mean of 
sample2. These sums of Squares are combined as shown above, 
in order to give a better estimate of the SD. In computing the 
SE of the difference between means, the SE of each mean is cal- 
culated from the same SD; hence formula (32) enables us to 
calculate SE» directly. 


SAMPLING AND RELIABILITY 207 


À second example will serve to illustrate further the use of 
significance levels when samples are small. 
Example (8) On an arithmetic reasoning test thirty-one ten- 
year-old boys and forty-two ten-year-old girls made the fol- 
lowing scores: 


Mean c N 
Boys: 40.39 8.69 31 
Girls: 35.81 8.33 42 


Is the mean difference of 4.58 between boys and girls sig- 
nificant? 
We may calculate the ср directly by formula (29) to be: — 


ср = 6.00)" + Sey = 2.05. Note that the standard errors 


of the means are calculated by formula (24a): (N — 1) for 
boys is 30, and (N — 1) for girls, 41. 

Тһе ¢ or critical ratio is 4.58/2.05, or 2.23, and the degrees 
of freedom to be used in testing the significance of the difference 
(.e., 4.58) are 30--41, or 71. We may take the /s for 70 
degrees of freedom in Table 29 without interpolation as these 
"s furnish a close approximation to the 75 for 71 degrees of 
freedom. When the degrees of freedom equal 70, a ¢ of + 2.00 
Ог more may be expected on the null hypothesis 595 of the 
time, and a £ of + 2.65 or more may be expected 1% of the time. 
The obtained t of 2.23 passes the .05 but not the .01 level. We 
may, therefore, reject the null hypothesis with considerable con- 
fidence, Moreover, we can assert not only that the difference be- 
tween boys and girls is significant, but that its value (odds 19:1) 
lies between .48 and 8.68 (4.58 + 2.00 x 2.05). 


(4) The Use of Table 29 in Determining the Significance of a 
Difference 

At the risk of repetition it may be helpful to summarize the 

applications of Table 29 to the problem of determining the re- 

liability of differences. For varying degrees of freedom, Table 29 

gives the values — Р or CR’s — to the left and right of which 

lie certain proportions of “Student's” distribution (p. 190). 


208 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Тһе 2в from Table 29 equal the critical ratios (2) of the nor- 
mal curve exactly when N is very large (i.e., oo), and approxi- 
mate them quite closely when N’s are 50 or more. 

In general, /s are tested against the null hypothesis, i.e., 
against the assumption that there is no true difference between 
the population means being compared, and that our two samples 
differ only through sampling accidents. Depending upon the 
evidence, we refute or retain the null hypothesis. When groups 
are independent, the degrees of freedom used in testing the 
significance of a difference equal (№, — 1) + (№ — 1), where 
N; is the size of the first, and № the size of the second sample. 
If the degrees of freedom equal 20, we reject the null hypothesis 
at the .05 level if 2 equals 2.09, and at the .01 level if ¢ equals 
2.84. For Гв less than 2.09 we accept the null hypothesis and 
mark the difference “not significant." When the degrees of 
freedom equal 30, a tof .683 stands for a difference which might be 
expected to occur fifty times in 100 trials through sampling errors 
alone. For any P (probability) greater than .05, the null hy- 
pothesis is retained and the difference is marked not significant, 

For many years it has been customary for investigators to 
demand a critical ratio of 3 or more before a difference is re- 
garded as significant. "This extremely high standard sets up а 
confidence level which is probably not warranted in many ex- 
perimental studies. 


(5) The Standard Error of the Difference between Two Means, 
When Means Are Correlated 
(a) Single Group Method 
Тһе last sections have dealt with the problem of determining 
whether the difference between two means is significant when 
these means represent the performance of different groups — 
boys and girls, Norwegians and Belgians, and the like. А 
closely related problem is concerned with the significance of the 
difference between two means obtained from the same test 
administered to the same group upon different occasions. This 


SAMPLING AND RELIABILITY 209 


is called the “single group" method. Suppose, for example, 
that we have administered a test to a group of children and 
after two weeks have repeated the test. We wish to measure 
the effect of practice or of intervening training upon the final 
Scores; or to estimate the effect of some activity interpolated 
between test and retest. In order to determine the significance 
of the difference between the means obtained in the initial 
and final testing, we must use the formula 


Op = Мом, + Pan — 2rg 0 arg ar, (33) 
(standard error of the difference between correlated means) 


in which oy, and см, are the standard errors of the initial and 
final test means, and ту» is the coefficient of correlation between 
Scores made on the initial and final tests.* Ап illustration will 
bring out the difference between formula (29) and formula (33). 
Example (4) At the beginning of the school year, the mean 
score of a group of sixty-five sixth-grade children upon an 
educational achievement test in reading was 45.00 with a c of 
6.00. At the end of the school year, the mean score on an 
equivalent form of the same test was 50.00 with а с of 5.00. 
"The correlation between scores made on the initial and final 
testing was .60. Has the class made significant progress in 
reading during the year? 


We may tabulate our data as follows: 


Initial Final 
Test Test 
No. of children: 65 65 
Mean score: 45.00 (M1) 50.00 (M) 
Standard deviations: 6.00 (о) 5.00 (с>) 
Standard error of the mean: 75 (oan)t 68 (см, 
Difference between means: 5.00 
Correlation between initial and " 


final tests: 


* "The correlation between the means of еее samples eru from 
a given population equals the correlation between test scores, the means 
of which are being dominated, See Kelley, T. L., Statistical Method (1923), 
р. 178. 

1 By formula (24а) 


210 STATISTICS IN PSYCHOLOGY AND FDUCATION 
Substituting in formula (33), we get 
op = У(75) + (63) — 2 X .60 X 75 х .63 = .63 


Since there are sixty-five children, there are sixty-five pairs of 
scores, and sixty-five differences. The number of degrees of 
freedom, accordingly, is 65 — 1 or 64. The critical ratio, Ё or 
D/6,, is 5.00/.63, ог 19. Thet for N — 1 = 64 is 2.39 at the .02 
level (Table 29). As our t is much larger than 2.39 the proba- 
bility is far less than .01 that the gain (p. 202) can be attributed 
to sampling errors. It is clear, therefore, that this class made 
significant progress in reading during the school year. 

When groups are small, a method slightly different from that 
given above is to be preferred when we are evaluating the differ- 
ence between two correlated means. An example will serve as 
an illustration: 


Example (5) Twelve subjects are given five successive trials 
upon a symbol-digit learning test. Data for the first and the 
fifth trials are as follows: 


1st trial 5th trial Diff. (5 — 1) 
Means: 160.42 171.85 11.43 
с: 14.05 


The mean gain is 11.43, and the SD around this mean is 14.05. 
Is the gain due to practice significant? 


. [14.05]. 

Pr у D s 4.35. 

From formula (22) the SZ of the mean gain ЕДІ is 4.35 
On the null hypothesis (i.e., with respect to a mean gain of zero) | 
we wish to test the significance of our gain of 11.43. The CR 
or tis 11.43/4.35 or 2.63. For 11 degrees of freedom [ (N — 1) 
= 11], we find from Table 29 that а 2 of 2.72 (column .02) will 
be exceeded in the positive direction in 1% of the trials. From 
the column headed .10, we find that a ¢ of 1.80 will be exceeded 
in the positive direction in 5% of the trials. Our mean gain of 
11.43 (t = 2.63) is significant at the .05 level, therefore, and al- 
most significant at the .01 level. Note again that we take entries 
from the .10 and .02 columns (for significance levels .05 and .01) 


SAMPLING AND RELIABILITY 211 


When we are interested in the probability of a gain (positive 
difference) as large or larger than 11.43. 

In problems like this, dealing with mean gain involves less 
caleulation and is to be preferred to the method of calculating 
SE's for each mean, an SZ of the difference, and the correlation 
between initial and final scores. 

(b) Equivalent Groups Method 

Formula (33) is often employed in experiments which make 
use of the method of equivalent. groups. Тһе equivalent groups 
method enables us to evaluate the effect of one or more experi- 
mentally varied conditions (experimental factors) as compared 
with the absence of these factors (control conditions). Тһе 
following problem is typical of many to which the equivalent 
group technique is applicable: | 

Example (6) Two groups, X and Y, of seventh-grade 
children are paired child for child for age and for score upon 
Form A of the Otis Group Intelligence Scale. Three weeks 
later, both groups are given Form В of the same test. Before 
the second test, Group X, the experimental group, is praised 
for its performance on the first test and urged to better its score 
if possible. Group У, the control group, 18 given the second 
test without comment. Will the incentive (praise) serve to 
increase significantly the final score of Group X over Group Y? 


The relevant data may be tabulated as follows: 


TABLE 30 
Experimental Control 
Group X Group Y 
= 72 
No. of children in each group: 12 5 
Mean scores оп Form A, initial test: $0.42 2518 
SD on Form A, initial test: [a (M3) 83.24 (М.) 
Mean scores on Form B, final test: 24.36 (о ) 21.62 (оз) 
D on Form B, final test: ӨРУ VU 89 ы 
Gain, M, — М»: 2.89 2.57 


Standard errors of means, final tests: 
Correlation between final scores (experiment . д 
The means and c's of the control and experimental groups in 
Form A (initial test) are almost identical, showing the original 
Pairing to have been quite satisfactory. The correlation be- 


al and control groups) = .65 


212 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Тһе difference (D) in the final mean test performance of the 
experimental and control groups is 88.63 — 83.24 ог 5.39. 'The 
Standard error of this D, ор, is found from formula (33) as 
follows: 

ср = У (2.89)? + (2.57)? -2х -65 х 2.89 х 2.57 = 2.30 
The 4 is 5.39/2.30, or 2.34, and there are 71 degrees of freedom. 
From Table 29 we find that the incentive group is significantly 


control groups. 

When two equivalent groups are small (say 8, 10, or less), a 
good plan is to compute the differences between final scores 
made by the paired subjects and follow the method for testing 
the significance of a mean gain outlined on page 210. Тһе 
degrees of freedom are one less than the number of pairs, 


И 


SAMPLING AND RELIABILITY 213 


although a large difference in N is not advisable. In evaluating 
the final scores of matched groups the procedure is somewhat 
different from that used in the equivalent groups method.* 
Let X be the function or test under study, and Y be a variable 
in terms of which our two groups have been matched as to mean 
апа SD. Then if rz, is the correlation between X and Y in the 
population from which our matched samples are drawn, the 
standard error of the difference between means in X is | 


SE Dan-aá— Ср М(о?м„ + а,.)(1 — 724) (84) 


(standard error of the difference between the M’s of 
matched groups) 


An example will illustrate the use of this formula. 
Example (7) The achievement of two groups of first-year 
high-school boys, the one from an academic, the other from a 
technical high school, is compared upon a Mechanical Ability 
Test. The two groups are matched for mean and SD upon 
а general intelligence test so that the experiment becomes 
one of comparing the mechanical ability scores of two groups 
of boys of “equal” general intelligence enrolled in different 


curricula. Data are as follows: 


TABLE 31 И 
Academic "Technical 
No. of boys in each group: 125 137 
Means on Intelligence Test (Y): 102.50 102.80 
о? on Intelligence Test (Y): = 33.65 28.62 
Means on Mechanical Ability Test (X): i 5583 


o’s оп Mechanical Ability Test (X): 
Correlation between the General Intelligence Test and the Mechanical 


Ability Test for first-year high-school boys is .30. 
М. — Mz, = 54.38 — 51.42 = 2.96 


6.24)* (7.14)? 
By (24a) and (34) op = Es + i5) (1 — .30°) 
= 79 


2.96 
tor CR = а= 3.75 
* Li ist, Е. Е, “Тһе Significance of a Difference between ‘Matched’ 
Group dU Pal "of Educational Psychology, 22 (1981), 107-204. 
Wilks, Š. S., “The Standard Error of the Means of ‘Matched’ Samples,” 
Journal of Educational Psychology, 22 (1931), 205-208. 


214 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Since the degrees of freedom 124 -- 136—1* are quite 
large, we may take the { values in the bottom line of Table 29 — 
ie., assume that the sampling distribution of CR’s or Ёз is 
normal. Our critical ratio of 3.75 exceeds the .01 level (2.58) 
and the mean difference is, therefore, highly significant. We 
may assert with great confidence that boys in the technical 
high school are definitely better on the Mechanical Ability Test 
than boys of “equal” verbal intelligence in the academic high 
school. 

The correlation term is introduced in formula (34) because 
when two groups are matched in one function, their variability 
(SD) is restricted in those functions correlated with the match- 
ing test. For example, height and weight are highly correlated 
in nine-year-old children. Hence, if a group of nine-year-olds, 
of the same or nearly the same height is selected, the vari- 
ability in weight of this group will be substantially reduced as 
compared with nine-year-olds in general. When groups are 
matched for several variables, e.g., age, intelligence, socio- 
economic status and the like, and compared with respect to a 
correlated variable, the correlation coefficient in formula (34) 
becomes a multiple correlation coefficient (p. 423). 

Matched groups and more often equivalent groups have been 
employed in a variety of psychological and educational studies. 
Well-known illustrations are found in experiments designed to 
evaluate the relative merits of two methods of teaching, to 
determine the effect of drugs, e.g., tobacco or caffeine, upon 
efficiency, to investigate the transfer effects of special training 
and many other factors. If the critical ratio (/) in such studies 
is significant when formula (29) is used, we may have con- 
fidence in our result, since the standard error given by formula 
(29) is always larger than the standard error obtained from 
formula (33) when r is positive. If the difference when for- 
mula (29) is used is not significant, however, it is still possible 
that it might prove to be so if the experiment were repeated 
under conditions changed so as to permit the calculation of the 
correlation between final scores. 


* One degree of freedom is subtracted for each variable (here one) 
jn terms of which the groups are matched. 


SAMPLING AND RELIABILITY 215 


2. The Reliability of the Difference between Medians 
The reliability of the difference between two medians may 
be found from the following formula: 


бр OF Osan- Маз, = М G^ án, + 6% мап, (35) 
(Standard error of the difference between uncorrelated 
medians) 


3. The Reliability of the Difference between Standard Devia- 
tions 

(1) The Standard Error of a Difference When o’s Are Uncor- 
related 

In many studies in psychology and education, the differences 
in variability which appear between groups are a matter of 
prime importance. The student of race and sex differences, for 
example, is often more interested in knowing whether his 
groups differ significantly in SD than in knowing whether they 
differ in mean score. Im like manner, the educational psychol- 
ogist who is investigating a new method of teaching often wants 
to know whether his “new” method has produced changes in 
variability greater than those brought about by the “old” 
method. 

When different groups are studied, or when the tests given to 
the same group are uncorrelated, the reliability of an obtained 
difference may be found by the formula 

Gp, Ог Ga, -o, = М0, + 6%, (36) 
(standard error of the difference between uncorrelated о?) 
where со, is the standard error of the first сапа со, is the stand- 
ard error of the second c (p. 194). 
| We may apply this formula to the problem of Norwegians 
and Belgians on page 198. The SD of the Norwegians’ scores 
on the combined scale was 2.47; of the Belgians’ scores on 
the same test 2.42. Is this difference in variability significant? 
Calling the SD of the Norwegians’ scores 01 and the SD of the 
Belgians’ scores т», we have, using large sample methods, 


216 STATISTICS IN PSYCHOLOGY AND EDUCATION 
2.47 


= = 071 by (27 

957 EX 611 | y 7) 
2.42 

= = G61 by (27 

la = V2 x 129 y (27) 


Ong = МСОТ) + (151) = 167 or 17 Бу (36) 


Тһе obtained difference in the o's is 2.47 — 2.42 or .05. Divid- 
ing this difference by .17 t = .05/.17 or 1.30. On the null 
hypothesis (Table 17), differences larger than + 1.3065, can be 
expected to occur about eight times in ten trials from sampling 
errors alone. "Тһе given difference is clearly not significant, 
therefore, and the null hypothesis is retained. 


(2) The Standard Error of a Difference When 078 Are 
Correlated 
When we compare the SD's of the same group upon two oc- 
casions or the SD's of equivalent groups on à final test, we must 
take into account the correlation between the 5/7 of the groups 
being compared. "Тһе formula for testing the significance of an 
obtained difference in variability when SD’s are correlated is 


бро = V Po, + 0*5, — 2" 00,0, (37) 
(standard error of the difference between correlated o's) 


where бо, and бо, are the standard errors of the two SD’s and 
t% is the square of the coefficient of correlation between scores 
in final and initial tests of the same group or between final 
scores of equivalent groups.* 

Formula (37) may be applied to the problems on pages 209 
and 211. In the first problem (p. 209) the SD of the sixty-five 
sixth-grade children is 6.0 on the initial test and 5.0 on the final 
test. Is there a significant difference in variability in reading 

* The correlation between the SD’s of samples drawn from a given 
population equals the square of the coefficient of correlation between the 


test scores, the SD’s of which are being compared. See Kelley, Т. L., 
Statistical Method (1923), p. 178. 


SAMPLING AND RELIABILITY 217 


after a year's schooling? If we call бі = 6.0, and о = 5.0, we 


have 
6.0 


Со = ——— = .58 by (27a) 
и y (27а) 
5.0 
gc. ce dpi by (27a 
ба ET y (27a) 


The coefficient of correlation between initial and final scores is 
.60, so that 7? = .36. Substituting for 72 and the сг’ in formula 
(37), we have 
Gp, = V (53): + (44)? — 2 X 36 X .53 X 44 
= .55 
The difference between the c's divided by .55 (53) = 1.80. 


The ¢ for 64 degrees of freedom is 2.00 at P = .05. Тһе com- 
puted t of 1.80 does not quite reach this point. Hence there is 
no reason for believing that a real difference in variability exists 
аз between these two groups. 

In the equivalent groups problem (p. 211) the SD of the 
experimental group on the final test was 24.36, and the SD of 
the control group on the final test was 21.62. The difference 
between these SD's is 2.74, and the number of children in each 
group is seventy-two. Did the incentive (praise) produce 
significantly greater variability in the experimental group as 
compared with the control? Putting о: = 24.36 and с» = 21.62, 
we have 


24.36 " 

=. = 2.04 Ъу (27а) 
957 2-1) 

__2162___1в| һу (27а) 


Фа = 7372 — 1) 
Тһе coefficient of correlation between final test scores in the 


experimental and control groups is .65, and 7%: is 12. Substi- 
tuting for 7? and the standard errors in formula (37) we have 


Ong = (ROI + SI — 2X 42X 2.04 X 181 
= 2.08 


218 STATISTICS IN PSYCHOLOGY AND EDUCATION 


If we divide 2.74 by 2.08 our critical ratio or t is 1.32. For 
11 degrees of freedom this ¢ (Table 29) is not significant at the 
Р = .05 level (2.00) nor in the positive direction at the P/2 
= 105 level (1.67). There is no evidence, then, that the incentive 
increased the variability of response. 


V. Tue RELIABILITY OF CERTAIN OTHER MEASURES 


This section will consider the standard errors of certain 
statistics which are used fairly often in experimental work. T he 
reliability of т, the coefficient of correlation, will be treated in 
Chapter IX, page 297. For the standard errors of many other 
important measures the student should go to the more ad- 
vanced references in the literature. Тһе Handbook of Statis- 
tical Nomographs, Tables, and Formulas, by Dunlap and Kurtz, 
contains many formulas which are often needed in research 
investigations. 


1. The Standard Error of a Percentage and the Standard 
Error of the Difference between Two Percentages 


Tt is often possible to find the percentage of a given group 
which exhibits a certain attribute or possesses certain interests 
or attitudes, or other fairly general characteristics, when it is 
difficult if not impossible to measure these attributes directly. 
Given the percentage occurrence of an attribute, the question 
of how much confidence we can put in our figure often arises. 
How reliable an index is it of the incidence of the phenomenon 
in which we are interested? The standard error of a percentage 
is given by the formula 


oy, = 100 A7 100/20») (38) 


(standard error of a percentage) 


in which р = the proportion of times the given event occurs; 
а=1- р; ава N= the number of cases. 
We may illustrate this formula with a problem: 


Example (1) In a study of cheating, a group of 613 ele- 
mentary school children were classified as to the occupations 


SAMPLING AND RELIABILITY 219 


of their fathers. It was found that 348 children had fathers 
who were professional men, business men, merchants, etc. Of 
these 348 children of “good” social status, 144 or 41.4% were 
found to have cheated on various tests given in school. As- 
suming our sample to be representative of children from the 
given social level, how much confidence may be placed in the 
stability of this percent? How much fluctuation in percent 
cheating might be expected if we investigated a number of 
groups of children whose fathers fall into the same occupa- 
tional classification? * 


Applying formula (38), we get 


Hit x 586 5. 
сө», = 100\/ ~i = 2-7% 


This standard error is interpreted as is сл for large samples; that 
is, we assume the sampling distribution of CR’s to be normal. 
On the evidence, therefore, the probability is .95 that the per- 
centage of children cheating really lies between 46.7% and 
36.1% (41.4 + 1.96 X 2.7). Only five times in 100 trials would 
we expect a pergentage to occur outside of these limits. | 

We often want to know whether there is a significant differ- 
ence between the percentages of two groups who exhibit a 
certain form of behavior. When our two groups constitute 
samplings from what seem to be different populations, or when 
percentages are uncorrelated,] we may determine the signif» 
icance of the difference between the percentages in the two 


groups by the formula: 
PET (39) 


or рй | Рэй 
ong, = 100\/ N, Т Ne 


[A 


(standard error of the difference belween two 
uncorrelated percentages) 


йт. Hartshorne, H., and Мау, M. A. Studies in Deceit (1928), Book II, 

1А $ A 

+ If certain members of Group I are more likely to cheat, when certain 

members of Group IT cheat, percentages cheating in the two groups will 
е correlated. 


290 STATISTICS IN PSYCHOLOGY AND EDUCATION 


We may illustrate the use of this formula by reference to 
Example (1) given above. It was stated that 41.49% of the 348 
children, classified as of “good” social status, cheated on the 
tests given. Іп the same study, 50.2% of 265 children whose 
fathers were classified as skilled and unskilled laborers, i.e., were 
of relatively “poor” social status, cheated on the same tests of 
deception. Is there a “тезі” difference in “deceptive behavior” 
between these two groups? The ое, for the percentage .502 
in the second group is 


Е 502 х 498 _ 
о =10 V o = 3.1% by (38) 


Calling 2.7 се, and 3.1 т, and substituting in formula (39), we 
have 


Ong, = VIT + 3.1 = 4.1% 


Тһе difference between the percentages of those who cheated 
in the two groups is 50.2 — 41.4 or 8.8. Dividing 8.8 by 4.1, 
we obtain a CR of 2.15. Assuming the distribution of CR’s 
to be normal (samples are large), we find from the bottom line 
of Table 29 that a of 2.15 is significant at the .05 level (1.96), 
but not at the .01 level (2.58). 


2. The Standard Errors of Measures of Skewness and of 
Kurtosis 


(1) Skewness 

In Chapter V, page 121, a formula for estimating the skewness 
of a frequency distribution in terms of its median and certain 
percentiles was given as follows: 


Put Pw) E Pa 


Sk = (19) 


Aecording to this formula, the skewness of the 50 Army Alpha 
scores (the distribution is given in Table 1, p. 6), is — 2.50. 


SAMPLING AND RELIABILITY 22 1 


. The significance of this measure of skewness may be determined 
by means of the formula 


.5185р (40) 


(standard error of the measure of skewness given in. 
formula (19)*) 


in which D = (Poo — Pr). 
In the frequency distribution of 50 Army Alpha scores, Py is 
187, Py is 152, and D = 35. From formula (40), therefore, 


and dividing — 2.50 (Sk) by 2.57 (o), we get a t of .97 (the 
sign of Sk indicates the direction of skewness). Assuming the 
distribution of Ёз to be normal, № is clear from Table 29 
that this £ falls far short of the .05 level. We шау feel quite 
sure, then, that the distribution is not significantly skewed. 

The skewness of the distribution of 200 cancellation scores 
(р. 14), is, by formula (19), .03; P» = 128.5, Pw = 110.4, and 
D = 18.1. The standard error of Sk is 

жы .5185 x 18.1 = 66 
200 


Dividing .03 (Sk) by -66 (ба), we get a t of .046; and from 
Table 29 find that the skewness is far from being significant. 
In fact this distribution is almost perfectly symmetrical (Fig. 5, 
p. 20, verifies this result). 
(2) Kurtosis 

On page 122 the following formula was given for measuring 
the kurtosis of a distribution in terms of Q and certain per- 
centiles: 

Ku= еШ (20) 
(Ри — Ру) 


* Kelley, Т. istical Method (1923), p. 77. The formula, as given 
in this iio d Tieni Dunlap, J. W., and Kurtz, A. K., Handbook 
of Statistical Nomographs, Tables and Formulas (1932), p. 112. 


222 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Тһе kurtosis of the frequency distribution of 50 Army Alpha 
scores by formula (20) is given on page 122 as .237. This value 
deviates — .026 from the Ku of the normal distribution which is 
.263 (to three decimals). The direction of the deviation indi- 
cates that the distribution is leptokurtic. 

We may estimate the significance of our deviation of — .026 
from “normal” kurtosis by calculating Ска, using the following 
formula: 

271119 


Ски = — 41 
н VN (41) 
(standard error of the measure of Ku given by 
formula (20)) 

in which N is the size of the sample. 
27779 
For the fifty Army Alpha scores, Ски = БО = .039, and 


Lor Kuta/oxu = -026/.039 or .67. Assuming а normal sampling 
distribution for 2, .67 is well below the .05 level (Table 29) and 
the deviation (^ peakedness ") of this frequency distribution from 
the normal form is not significant. 

The kurtosis of the 200 cancellation scores (p. 122) is by 
formula (20) .223, a value which deviates — .040 from .263, the 
Ku of the normal distribution. Тһе direction of the deviation 
indicates leptokurtosis. 

To determine the significance of this deviation from nor- 
mality, calculate съ, which equals .020. Kua/ox. equals 
.040/.020 or 2.00, and from Table 29 we find that the leptokur- 
tosis of the distribution is significant at the .05 level (P/2 — 1.65) 
but not at the .01 level. "Тһе narrow dispersion of this distribu- 
tion (Q = 4.04), leading to a concentration of cases in the middle 
range, probably accounts for its strong tendeney to be more 
“peaked” than the normal distribution (see p. 122). 


VI. SAMPLING AND THS USE OF RELIABILITY FORMULAS 


All of the reliability formulas given in this chapter depend 
upon N, the number of cases in the sample, and most of them 


SAMPLING AND RELIABILITY 223 


. involve some measure of variability (usually о) calculated from 
the data. It is unfortunate, perhaps, that given these statistics 
there is nothing in the statement of a reliability formula itself 
which might deter the uncritical worker from applying it to any 
set of test scores. General and indiscriminate calculation of 
standard errors, however, will lead to erroneous conclusions 
and false interpretations. For this reason, it is important 
that the research worker in experimental psychology or in edu- 
cation have clearly in mind (1) the conditions under which reli- 
ability formulas are — and are not — applicable; and that he 
know (2) what his formulas may reasonably be expected to do.* 
Some of the limitations to reliability formulas have been pointed 
out in this chapter. ‘These statements will now be amplified 
and certain cautions to be observed in the use of reliability 


formulas indicated. 


1. Reliability Formulas Assume Random Samples 

Reliability formulas apply strictly to random samples only: 
when other sampling methods have been employed, special 
techniques must be used in determining significance levels. 
The criterion of randomness in a sample is met when every 
person in the population from which the sample has been drawn 
has had an equal chance of being chosen. A random sample is 
truly representative of its population, since cases are chosen 
without bias as to able, mediocre, and poor individuals. It may 
seem paradoxical, but one must often take great pains to 
"select" his sample randomly. To be representative of ten- 
year-old boys within a given city, for example, a group must not 
be drawn exclusively from a poor neighborhood, from expensive 
private schools, or from any larger group in which special 
factors are known to play an important róle. 

Mental traits which have been carefully measured in large 
samples have usually proved to be normally or approximately 

* Walker, Helen M., Elementary Statistical Method (1943), Chapter 15, 


pp. 263-271. Ж қ 
f MeNemar, Q., Sampling in Psychological Research, Psychological 


Bulletin, 37 (1930), 331-365. 


224 STATISTICS IN PSYCHOLOGY AND EDUCATION 


normally distributed. We may make the reasonable assump- 
tion, therefore, that many of the traits in which we are inter- 
ested follow the normal distribution in the general population. 
Random samples drawn from a normally distributed population 
will also be normally distributed, so that normality becomes one 
criterion of adequacy in a sample. Тһе range covered by 
samples of different sizes (all drawn from a normal population) 
will be approximately as follows: 


М = 10 Range + 2.06 
М = 50 Range + 2.50 
N = 200 Range + 3.00 

/ = 1000 Range + 3.50 


A range of + 3.50 from the mean includes, in a normally dis- 
tributed group, 9995 cases in 10,000 (Table 17). The same 
range includes, of course, 99.95% of the cases in a sample of 
100. In the sample of 10,000, five cases fall outside of this range; 
in a sample of 100, no cases lie outside of the given range. The 
more extreme the deviation, the less the probability of its occur- 
rence; and in small samples, wide deviations from the mean 
rarely appear if the sample is truly representative of a normally 
distributed group. When working with small samples, there- 
fore, deviations far removed from the mean should often be 
discarded much as a laboratory worker throws out measures of 
reaction time which are obviously premature or delayed. 

One of the simplest tests of the adequacy — the representa- 
tiveness — of a sample consists in drawing from the population 
another group of approximately the same size as the sample 
with which we are working. If the means and sigmas computed 
from these two independently drawn groups are of almost the 
same size, we may feel reasonably sure that both samples are 
representative of the population. If the correspondence is not 
close, we may try the expedient of adding new cases to our 
samples until they yield means and o’s which are increasingly 
similar or increasingly dissimilar. In the latter event neither 
sample is likely to be adequate. More information may be 


SAMPLING AND RELIABILITY 225 


. Secured with respect to the reliability of a mean or т by repeated 
sampling, or by a careful study of several samples, than can be 
obtained from an uncritical and blanket use of reliability 
formulas. 


2. Reliability Formulas Assume а “Sufficiently Large” 
Sample 

The value of a standard error is conditioned, in part at least, 
upon our having a sufficiently large sample. A small sample 
may be satisfactory in intensive laboratory studies in which 
many measurements are taken on each subject. But if М is less 
than about 25, there is usually little reason for assuming such 
а small sample to be descriptive of a given population. As we 
have seen (p. 183) standard errors vary inversely as the size of 
the sample; hence, the larger the sample in general the smaller 
the error. А fairly simple and practical method of deciding 
when a sample is “sufficiently large" is to increase № until the 
addition of extra cases drawn at random fails to produce an 
appreciable change in the mean or ø. When this point is 
reached, the sample is probably large enough to be taken as 
descriptive of its population. But the corollary must be recog- 
nized that mere numbers do not in themselves guarantee a 
representative sample.* 


3. Reliability Formulas Measure Fluctuations Arising from 
Sampling and from Errors of Measurement 


Standard errors of means, 675, etc., measure both (1) sampling 
errors, and (2) errors of measurement, i.e., variable errors in the 
test scores themselves (p. 394). We have already considered the 
question of the sampling error of the mean on page 184. Ifa 
sample were perfectly representative of its population, its mean 
and с would equal the mean and с of the population. Except 
by chance, however, neither a given sample nor another similarly 
Selected and approximately of the same size will describe the 


* See The New Science of Public Opinion Measurement (American Insti- 
tute of Public Opinion, Princeton, N. J.). 


226 STATISTICS IN PSYCHOLOGY AND EDUCATION 


entire population perfectly. Moreover it is unlikely that means 
calculated from successive samples will equal each other. Un- 
certainty as to the reliability of a calculated measure grows out 
of the fact that we must necessarily work with samples instead of 
with the whole population. Variations from sample to sample — 
the so-called “errors of sampling" — are not to be thought of as 
mistakes, failures and the like, but rather as fluctuations which 
arise from the fact that no two samples are ever exactly alike. 
If samples are random and sufficiently large, and if there is no 
constant error, calculated means will tend to vary around the 
true mean of the population within a comparatively small range. 
This range is given by the standard error. 'The accuracy limits 
of a mean (p. 187) should be calculated from the (distribution 
(Table 29) when N is small, and from the normal probability 


distribution when N is large. 
Tf the standard error of a mean is large, it does not follow 


necessarily that the mean is affected by a large sampling error. 
Much of the error may be due to errors of measurement. On 
the other hand, when errors of measurement are known to be 
negligible, а small standard error does indicate that the 
reliability of a calculated measure is high insofar as sampling 
fluctuations are concerned. In other words, when the standard 
error is small a mean or c is a good estimate of the population 
mean or б. 


4. Reliability Formulas Do Not Measure the Effects of Con- 
stant Errors Nor the Failure to Get a Random Sample 

Errors which arise from inadequate sampling are neither 
detected nor measured directly by reliability formulas. For 
example, the mean score on an intelligence test made by 500 
male college students between the ages of eighteen and twenty- 
five will not be representative of the achievement of the male 
population within this age range. College students constitute 
a highly selected group; and in consequence, other samples of 
500 young men, aged eighteen to twenty-five, and drawn at 
random from the male population will return very different 


SAMPLING AND RELIABILITY 227 


means and sigmas from those obtained with the college group. 
'These differences in mean and c cannot be attributed to sam- 
pling errors, since samples were not drawn at random from the 
same population. If our population were restricted to college 
men, our original sample of 500 might, of course, be entirely 
adequate. 

Reliability formulas are affected by, but do not reveal, con- k 


stant errors. Constant errors work in only one direction, arele 


always plus or always minus. Constant errors arise from many 
Sources — familiarity with test material, fatigue, faulty tech- 
nique in giving and scoring tests (over- and under-timing are 
examples), in fact, from a consistent bias of almost any sort. 
Standard errors caleulated for measures subject to such influ- 
ence when not definitely misleading are at best of doubtful 
values. Тһе careful study of successive samples, rechecks 
when practicable, care in controlling conditions, and the use of 
objective checks will eliminate many prolific and troublesome 
sources of constant error. The research worker should always 
bear in mind that even the most refined statistical technique 


cannot make bad data yield valid results. 


PROBLEMS 


1. Given: М = 26.40; с = 3.20; № = 100. 
(a) Determine the accuracy limits of this M at the .05 level; at 


the .01 level. 2 
(6) Determine the accuracy limits of ø at the .05 level and .01 level. 


2. Given: Mdn = 7240; Q = 12.84; N = 81. 
(a) Determine the accuracy limits of Mdn at the .05 level; at the 


-01 level. 
(b) Determine the accuracy limi 
3. "The mean of a large sample is К an 
chances that the sample mean misses 
(а) + 1.00; (b) =3.00; (©) = 10.00? 
4. The following five measures of perception span for unrelate 


are obtained from one observer: 
5 6 4 7 5 


ts of Q at the .05 level. 
d ox is 2.50. What are the 
the true mean by more than 


d words 


А 


228 STATISTICS IN PSYCHOLOGY AND EDUCATION 


(a) Determine .05 and .01 accuracy limits for the mean (page 201). 

(b) Determine .05 accuracy limits for the SD. 

(c) Compare the .05 accuracy limits for the mean when caleulated 
by large and by small sample methods. 


. The difference between two means (M, — М.) is 3.60, the ор = 
3.00 and the samples are large. 


(a) Is the obtained difference significant at the .05 level? 
(b) What percent is the obtaine:l difference of the difference neces- 
sary for significance at the .01 level? 


. À personality inventory is administered in a private school to eight 
boys whose conduct records are exemplary, and to five boys whose 
records are very poor. Data are given below. 


Group 1: 110 112 95 105 11 97 112 102 
* 9: 115 112 109 112 117 


Is the difference between group means significant at the .05 level? 
at the .01 level? 


. In the first trial of а practice period, twenty-five twelve-yenr-olds 
have а mean score of 80.00 and a с of 8.00 upon a digit-symbol 
learning test. On the tenth trial, the mean is 88.00 and the c is 
10.00. Тһе r between scores оп the first and tenth trials is .40. 


(a) Is the gain in score significant at the .05 level? at the .01 level? 
(b) Is the increase in variability significant at the .05 level? at the 
.01 level? 


. Two groups of high-school pupils are matched for initial ability 
in a biology test. Group 1 is taught by the lecture method, and 
Group 2 by the lecture-demonstration method. Data are as 
follows: 

Group 1 Group 2 
(control) (experimental) 


N 60 60 

Mean initial score on the biology test 42.30 42.50 
c of initial scores on the biology test 5.36 5.38 
Mean final score on the biology test 54.54 56.74 
c of final scores on the biology test 6.34 7.25 


т (between final scores on the biology test) = .50 


SAMPLING AND RELIABILITY 229 


(a) Is the difference between the final scores made by Groups 1 and 2 
upon the biology test significant at the .05 level? at the .01 level? 

(b) Is the difference in the variability of the final scores made by 
Groups 1 and 2 significant at the .05 level? 

9. Two groups of high-school students are matched for M and c 
upon a group intelligence test. There are fifty-eight subjects in 
Group А and seventy-two in Group В. The records of these two 
groups upon a battery of “learning” tests are as follows: 


Group А Group B 
M 48.52 53.61 
с 10.60 15.35 
N 58 72 


Тһе correlation of the group intelligence test and the learning 
battery in the entire group from which A and B were drawn is 
.50. Is the difference between Groups À and B significant at the 
105 level? at the .01 level? 

10. Calculate measures of skewness and kurtosis for each of the four 
distributions in Chapter II, problem 1, page 46. Compute 
standard errors of Sk and Ku by the formulas given on pages 221 
and 222. Determine whether any of these distributions departs 
significantly from the normal form. 

11. In a city high school of 5000 pupils, 52.3% are girls; and in a 
second high school of 3000 pupils, 47.7% are girls. Is there a 
significant difference between the percentages of girls enrolled in 
the two high schools? 

eighty delinquent and eighty non-delinquent 


12. In an institution, e 
and roughly the same social status 


boys of the same age, same I.Q., 

furnish the following data: 

(а) 40% of the delinquent, and 20% of the non-delinquent come 
from “poor” homes. 

(b) 74% of the delinquent and 44% of the non-delinquent score 
above the “normal” median on а neurotic inventory. 

(c) 65% of the delinquent and 50% of the non-delinquent cheat 
on a certain test. 

Are any of these differences significant? 

13. In a random sample of 100 cases each from four groups, A, B, C 
and D, the following results were obtained: 


A^ ем 


230 STATISTICS IN PSYCHOLOGY AND EDUCATION 


А В с D 
Mean 101.00 104.00 93.00 86.00 
c 10.00 11.00 9.60 8.50 


"What are the chances that, in general, the mean of 
(a) the B's is higher than the mean of the A's. 
(b) the A's is higher than the mean of the C's. 
(c) the C's is higher than the mean of the D's. 


What are the chances that 

(a) a B will be better than the mean А. 
(b) a B will be better than the mean C. 
(c) a B will be better than the mean D. 


ANSWERS 

1. (а) 25.77 and 27.03; 25.57 and 27.23 

(b) 2.75 and 3.65; 2.61 and 3.79 
2. (a) 67.21 and 77.59; 65.60 and 79.20 
(b) 9.59 and 16.09 
69 in 100; 23 in 100; less than 1 in 100 
. (a) 3.98 and 6.82; 3.05 and 7.75 
(b) .14 and 2.14 
(c) 4.50 and 6.30; 3.98 and 6.82 
(а) No. СЕ = 1.20 
(6) 46.5% 
t = 2.3; significant at .05 but not at .01 level 
7. (a) (D/ap) or = 3.92; significant at .05 and at .01 levels 
(b) (D/eng) or = 1.18; not significant at .05 level 
(a) t — 2.47; significant at .05 but not at .01 level; 
(b) $ = 1.18; not significant at .05 level 
9. 4 = 2.57; significant at .05 and at .01 levels 


pe 59 


ex 


> 


© 


10. Distri- 
bution са Kus/cks 
1 — 23 155 Deviation from normality not significant 
„51 н” .38 “ “ “ “ “ 


2 
3 .33 .93 “ “ “ “ “ 
4 .13 .68 “ “ “ “ “ 


SAMPLING AND RELIABILITY 231 


11. D/ap.. = 4.0; significant at .01 level 
12. (a) Р/ор., = 2.83; significant at .01 level 
(b) D/ep.. = 4.05; significant at .01 level 
(c) Р/ор. = 1.94; almost significant at .05, not at .01, level 
13. (a) 98 in 100 
(b) more than 99 in 100 
(c) more than 99 in 100 
(а) 61 in 100 
(b) 84 in 100 
у (с) 95 in 100 


CHAPTER VIII 


TESTING EXPERIMENTAL HYPOTHESES 


A PSYCHOLOGICAL experiment is designed to answer some 
question which the investigator has in mind. Тһе investiga- 
tor's hypothesis may be in the nature of a general proposition 
or it may be a specific query. А specific hypothesis is, ordi- 
narily, to be preferred to a general one, as the more definite and 
exact the thesis the greater the likelihood of a conclusive an- 
swer. In the preceding chapter we were concerned with testing 
hypotheses concerning differences of various sorts: differences 
between means, 075, percentages, and the like. Тһе significance 
of obtained differences was tested by calculating a critical ratio 
which was evaluated in terms of the normal distribution 
(p. 115) or the distribution (p. 190). In the present chapter 
we shall consider somewhat more carefully the nature of hypoth- 
eses and shall present certain useful ways of answering the ques- 
tions raised by an experiment. 


I. Тнк Мои, НүротнЕѕІЅ 

1. Meaning of the Null Hypothesis 

We have already had occasion to employ the null hypothesis 
in Chapter VII, where the significance of the differences be- 
tween two groups was to be tested. "Тһе null hypothesis, it will 
be remembered, asserts that no true difference exists as between 
our two samples; that, in fact, these samples were randomly 
drawn from the same population, and differ only by accidents 
of sampling. А null hypothesis, therefore, constitutes 4 
challenge; and the function of an experiment is to give the facts 
а, chance to meet (or fail to meet) this challenge. То illustrate, 
suppose it has been claimed that ten-year-old girls read better 
than ten-year-old boys. This hypothesis is indefinite as it 
stands, and hence is not testable, as we do not know how much 


better than boys the girls must read before they can be said to 
232 


#1 


TESTING EXPERIMENTAL HYPOTHESES 233 


“Tread better." If we assert that girls read no better than boys 
ог — to say the same thing — that such differences as are 
found in reading ability as between groups of ten-year-old boys 
and girls can be attributed to accidents of sampling, this (null) 
hypothesis 75 exact and сап be tested by the usual sampling 
formulas. Suppose that groups of ten-year-old boys and girls 
are drawn at random from the school population, and that on а 
standard reading examination the mean score of girls is sig- 
nificantly higher than the mean score of boys. If this happens 
the null hypothesis is disproved and must be rejected. In dis- 
carding the null hypothesis what we are really saying is that the 
difference in reading achievement as between boys and girls 
cannot be fully explained by sampling fluctuations. 

It is important to realize that the rejection of a null hypothesis 
does not force the acceptance of a contrary view.* A significant 
difference in reading ability as between ten-year-old boys and 
girls, for instance, does not prove girls to be better readers, it 
simply means that the two groups do actually differ. In sub- 
sequent comparisons of boys and girls, if all experimental vari- 
ables likely to influence the reading score are controlled and the 
difference still remains, we may then be willing to assert the 
existence of a true sex difference in reading ability. But the 
. acceptance of a positive hypothesis — it should be noted — is 
usually the end result of a series of experiments. Furthermore, 
it is a logical and not à statistical conclusion. | 

The extra-sensory perception (ESP) experiments] offer a 
good illustration of the meaning of a null hypothesis. In a 
typical experiment in ESP a pack of twenty-five cards is used. 
There are five different symbols on these cards, each symbol ap- 
pearing on five cards. In guessing through a pack of cards, the 
probability of chance success with each card is 1/5 (on the 
average). And the number of correct “calls” in a pack of 
twenty-five cards should be five. If a subject calls the cards 


* Morgan, J. J. B., “Credence Given to One Hypothesis because of the 
Overthrow of Its Rivals,” American Journal of Psychology, 58 (1945), 54—64. 
Rhine, J. B., et al, Extra-Sensory Perception after Sixty Years (New 


York: Henry Holt and Co., 1940). 


234 STATISTICS IN PSYCHOLOGY AND EDUCATION 


correctly considerably in excess of chance expectation (i.e., in 
excess of five), the null hypothesis is rejected. But rejection of 
the null hypothesis does not force immediate acceptance of ESP 
as the cause of extra-chance results. Before this conclusion can 
be reached we must demonstrate in a series of experiments that 
extra-chance results are obtained when we have eliminated all 
likely causes such as runs of cards, cues, poor shuflling and re- 
cording, and the like. If under rigid controls results in excess of 
chance are consistently achieved, we may reject the null hy- 
pothesis and accept ESP. But the acceptance of ESP, as of 
any positive hypothesis, .. necessarily tentative and is con- 
tingent upon further work. 

Ordinarily, the null hypothesis is more useful than other 
hypotheses because it is exact. Hypotheses which assert that 
some group is “better” or “more accurate” or “more skilled? 
than another are inexact and cannot be tested, as we cannot 
quantify our expected finding. Hypotheses other than the null 
hypothesis can, to be sure, be made exact: we may, for example, 
assert that a group which has received special training will be jive 
points on the average better than an untrained (control) group. 
It is difficult, however, to set up such precise expectations in 
most experiments; and for this reason it is advisable to adopt 
the null hypothesis in preference to others if this can be done. 


2. Testing the Null Hypothesis against the Direct Determina- 
tion of Probable Outcomes 


The null hypothesis can often be efficiently tested by com- 
paring experimentally observed results with those to be ex- 
pected from probability theory. Several examples will illustrate 
the methods to be employed. 


Example (1) Two tones, differing slightly in pitch, are to 
be compared in an experiment. The tones are presented in 
succession, the subject being instructed to report the second 
as higher or lower than the first. Presentation is in random 
order. In ten trials a subject is right in his judgment seven 
times. Is this result significant, i.e., better than chance? 


TESTING EXPERIMENTAL HYPOTHESES 235 


Since the subject is either right or wrong in his judgment, 
and since judgments are separate and independent, we may test 
our result against the binomial expansion (p. 104). Теп judg- 
ments may be taken as analogous to ten coins; a right judgment 
corresponds to a head, say, a wrong judgment to a tail. The 
odds are even that any given judgment will be right; hence in 
ten trials (since p = 1/2) our subject should in general be right 
five times by chance alone. The question, then, is whether 
seven “rights” are significantly greater than the expected five. 
From page 108 we find that upon expanding (р + 4)” the 
probability of ten right judgments is 1/1024; of nine right and 
one wrong, 10/1024; of eight right and two wrong, 45/1024; 
and of seven right and three wrong, 120/1024. Adding these 
fraetions we get 176/1024, or .172 as the probability of seven 
ог more right judgments by chance alone. The probability of 
just seven rights is 120/1024 or approximately .12. Neither of 
these results is significant at the .05 level of confidence (p. 201) 
and accordingly the null hypothesis must be retained. On the 
evidence there is no reason to believe that our subject’s judg- 
ments are really better than chance expectation. 

Note that to get ten right is highly significant (the probability 
is approximately .001); to get nine or ten right is also significant 
(the probability is 1/1024 + 10/1024 or approximately .01). То 
get eight or more right is almost significant at the .05 level (the 
Probability is .055); but any number right less than eight fails 
to reach our standard. The situation described in Example (1) 
Occurs in a number of experiments — whenever, for example, ob- 
jects, weights, lights, test items, or other stimuli are to be com- 
pared, the odds being 50:50 that a given judgment is correct. 


Example (2) Ten photos, five of feeble-minded and five 
of normal children (of the same age and sex), are presented 
to a subject who claims he can identify the feeble-minded 
from their photographs. The subject 13 instructed to desig- 
nate which five photographs are those of feeble-minded 
children. How many photos must our subject identify cor- 
rectly before the null hypothesis is disproved? 


236 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Since there are five feeble-minded and five normal photos, the 
subject has a 50:50 chance of success with each photo and the 
method of Example (1) could be used. A better test,* however, 
is to determine the probability that a partieular set of five 
photos (namely, the righ five) will be selected from all possible 
sets of five which may be drawn from the ten given photos. 
То find how many combinations of five photos can be drawn 
from a set of ten, we may use conveniently the formula for the 
combination of ten things taken five at а time. This formula] 
is written C"; = s = 252. The symbol C'* is read “the 
combinations of ten things taken five at a time"; 10! (read 
“10 factorial”) is 10-9-8:7-6-5:4:3-2-1; and 5! is 5-4-2:2:1. 

It is possible, therefore, to draw 252 combinations of five from 
a set of ten, and accordingly there is one chance іп 252 that a 
judge will select the five correct photos out of all possible sets 
of five. If he does select the right five, this result is obviously 
significant (the probability is approximately .004) and the null 
hypothesis must be rejected. Suppose that our judge's set of 
five photos contains four feeble-minded and one normal picture; 
or three feeble-minded and two normal pictures. Is either of 
these results significant? The probability of four right selec- 
C5, X C5 

(1%, 
product of the number of ways four rights сап be selected from 
the five feeble-minded pictures times the number of ways one 
wrong can be selected from the five normal pictures divided by 
the total number of combinations of five. Calculation shows 
this result to be 25/252 or 1/10 (approximately) and hence not 
significant at the .05 level. The probability of getting three 
С X C5 

` (19, 

ж Fisher, В. A., The Design of Experiments (1935) Chapter 2, рр. 26- 


29 especially. 
t The general formula for the combinations of n things taken r аба 
n 


tions and one wrong selection by chance is ; ie., the 


right and two wrong is given by 


; namely, the product 


joco бы VR cas 
time is C^. бе 


m 


TESTING EXPERIMENTAL HYPOTHESES 237 


of the number of ways three pictures can be selected from five 
(the five feeble-minded pictures) times the number of ways 
two pictures can be selected from the five normal pictures di- 
vided by the total number of combinations of five. This result 
is 100/252 or slightly greater than 1/3, and is clearly not 
significant. 

Our subject disproves the null hypothesis, then, only when 
all five feeble-minded pictures are correctly chosen. The 
probabilities of various combinations of right and wrong choices 
are given below — they should be verified by the reader: 

Probability of all 5R = — 1/252 


4R — 25/252 
“ * ЗЕ = 100/252 
“ « 9R = 100/252 
“ "  ]R- 25/252 
“ « 08 = 1/252 


It may be noted that by increasing the number of pictures of 
feeble-minded and normal from ten to twenty, say, the sensi- 
liveness of the experiment can be considerably enhanced. With 
twenty pictures it is not necessary to get all ten feeble-minded 
photos right in order to achieve a significant result. In fact, 
eight right is significant at the .01 level as shown below. 


20! 
C?» = 101101 - 184758 
Combinations Frequency Prob. ratio (freq. + 184,756) 

10R oW .000005 
98 1W 100 -0005 
8R 2W 2025 011 
7R..3W 14400 078 
6R 4W 44100 288 
5R Бүү 63504 343 
4R 6W 44100 238 
зв тү 14400 078 
2R яу 2025 011 
IR 9W 100 .0005 
OR 10W 1 .000005 


184,756 


238 STATISTICS IN PSYCHOLOGY AND EDUCATION 


3. Testing the Null Hypothesis against Probabilities Calcu- 
lated from the Normal Curve 


"When the number of observations or the number of trials is 
large, direct calculation of probabilities by expanding the bi- 
nomial (р + Ф)" becomes highly laborious. Since (р + 4)" 
yields а distribution (p. 110) which is essentially normal when 
п is large, in many experiments the normal curve may be 
usefully employed to provide expected results under the null 
hypothesis. An example will make the method clear. 

Example (8) In answering a test of 100 true-false items, 
a subject gets sixty right. Isitlikely that the subject merely 
guessed? 

As there are only two possible answers to each item, one of 
which is right and the other wrong, the probability of a correct 
answer to any item is 1/2, and our subject should by chance 
answer 1/2 of 100 or 50 items correctly. Letting p equal the 
probability of а right answer, and q the probability of a wrong 
answer, we could, by expanding the binomial (p + q)'*, calcu- 
late the probability of various combinations of rights and wrongs 
on the null hypothesis. When the exponent of the binomial 
(here, number of items) is as large as 100, however, the result- 
ing distribution is very close to the normal probability curve 
(p. 110) and may be so treated with little error. 

Figure 46 illustrates the solution of this problem. The mean 
of the curve is set at 50. Тһе SD of the probability distribu- 
tion found by expanding (р + 0)" is с = Упр; hence for 
(p+ q)*, в = У100 x 1/2 X 1/2 or 5. А score of 60 covers 


the interval on the baseline from 59.5 up to 60.5. Тһе lower 
59.5 — 
mim E 1.90); 


and from Table 17 we find that 2.87% of the area of a normal 
curve lies above 1.90. There are only three chances in 100 that 
a score of 60 (or more) would be made if the null hypothesis 
were true. А score of 60, therefore, is significant at the .05 
level. We may reject the null hypothesis with some confidence 


limit of 60 is 1.96 removed from the mean ( 


TESTING EXPERIMENTAL HYPOTHESES 239 


М =50 


а 5 

(59.550) = 

5 1.95 
Fic. 46. 


and conclude that our subject could not have been simply 
guessing. 

Note that the problem above could have been solved equally 
well in terms of percentages. We should expect our subject 
to get 50% of the items right by guessing. The SD of this 


ы .50 m 
percentage is 100 PY or 100 V 50Х -50 ог 5%, A score of 
n 100 


60% (lower limit 59.5%) is 9.5% or 1.90 distant from the middle 
of the curve. We interpret this result in exactly the same way 
as that above. 
Example (4) А multiple-choice test of sixty items provides 
four possible responses to each item. How many items should 
a subject answer correctly before we may feel sure that he 
knows something about the test material? 


_ Since there are four responses to each item, only one of which 
is correct, the probability of a right answer by guessing is 1/4, 
of а wrong answer 3/4. The final score to be expected if a sub- 
ject knows nothing whatever about the test and simply guesses 


240 STATISTICS IN PSYCHOLOGY AND EDUCATION 


is 1/4 х 60 or 15. Our task, therefore, is to determine how 
much better than 15 a subject must score in order to demon- 
strate real knowledge of the material. 

This problem could be solved by the methods of Example (1). 
By expanding the binomial (p + д)” in which p = 1/4, q = 3/4, 
and n = 60, we can determine the probability of the occurrence 
of any score from 0 to 60. Тһе direct determination of probabili- 
ties from the binomial expansion is straightforward and exact 
but the calculation is rather tedious. А satisfactory approxi- 
mation to the answer we want may be obtained by using the 
normal probability distribution to determine probabilities, as 
in Example (3). Тһе mean of our "chance" distribution is 
1/4 of 60 or 15; and the e = Vnpg = V60 x 1/4 X 3/4 or 3.35. 
From Table 17 we know that 5% of the frequency in a normal 
distribution lie above 1.650. Multiplying our obtained c 
(3.85) by 1.65, we get 5.53; and this value when added to 15 
gives us 20.5 as the point above which lie 596 of the "chance" 
distribution of scores. А score of 21 (20.5 to 21.5), therefore, 
may be regarded as significant, and if a subject achieves such а 
score we can be reasonably sure that he is not merely guessing. 

For a higher level of assurance, we may take that score which 
would oecur by chance only once in a hundred trials. From 
Table 17, 1% of the frequency in the normal curve lies above 
2.330. This point is 7.81 (3.35 х 2.33) above 15 or at 22.8. А 
score of 23, therefore, or a higher score is very significant; only 
once in one hundred trials would a subject achieve such a score 
by guessing. 

Use of the normal probability curve in the solution of prob- 
lems like this always involves a degree of approximation. When 
p differs considerably from 1/2 and n is small, the distribution 
resulting from the expansion of (р + 4)" is skewed and is not 
therefore accurately described by the normal curve. Under 
these cireumstances one must resort to the direct determination 
of probabilities as in Example (1). When m is large, however, 
and p not far from 1/2, the normal distribution ean be safely 
used, as will be shown by the chi-square tests on page 245. 


E 


TESTING EXPERIMENTAL HYPOTHESES 241 


Il. Tue x? (CHI-SQUARE) TEST 


The chi-square test represents a useful method of evaluating 
experimentally determined results against results to be ex- 
pected on some hypothesis. The formula for chi-square (x?) is 

e = 2 [2 ” 
(chi-square formula for testing agreement between 
observed. and. expected results) 
in which 

f, = frequency of occurrence of observed or experimentally 

determined facts; 

f. = expected frequency of occurrence on some hypothesis. 

'The differences between observed and expected frequencies 
are squared and divided by the expected number in each case, 
and the sum of these quotients is ҳ°. The more closely the ob- 
served results approximate to the expected, the smaller is chi- 
square and the closer the agreement between the observed data 
and the hypothesis being tested. On the other hand, the larger 
the chi-square, the greater the probability of a real divergence 
of experimentally observed results from expected results. To 
evaluate chi-square, we enter Table 32 with the given value of 
chi-square and with df, the number of degrees of freedom. The 
quantity df = (r — 1)(c — 1) in which т is the number of rows 
and c the number of columns in which the data are tabulated. 
From Table 32 we find P, the probability that the obtained x* 
is significant. Several illustrations of the chi-square test will be 
given in the sections following. 


1. Testing the Divergence of Observed Values from Values 
Calculated on the Hypothesis of Equal Probability (Null 
Hypothesis) 

Example (1) Forty-eight subjects are asked to express 
their attitude toward the proposition “Should the United 
States Join a Security Organization of Nations?" by marking 
F (favorable) I (indifferent) or U (unfavorable). Of the 


242 STATISTICS IN PSYCHOLOGY AND EDUCATION 


26806 | 2962 | елге | 92С70ғ | 022796 | 089'88 | 98865 | 80206 | $98'85 
386-6$ | £69°9F | /99°5Р | 28066 | GET'SE | T9F'ZE | 96686 | 126% | 94755 
SAC SY | GIF'SE | Z££ IP | 91626 | 22078 | 16616 | 966772 | 2%0%6 | 88915 
€96°9F | OFL'FF | ЕТГОР | 17:96 | 516'58 | 61&0£ | 988796 | 61/55 | 80706 
292 | 998: | <8886 | 69256 | 606/718 | 9266 | 98885 | 662-15 | 02661 
ЕГЕР | 996°1Р | SOLE | 286775 | 79706 | 641786 | 756% | 298002 | OPG'SI 
0862” | 027208 | €15"9€ | 961756 | ESLG | 96025 | 26666 | ЕРІ | 29081 
389-17 | 896:86 | 2/1:56 | 200:26 | 60-82 | 81092 | 258726 | 120611 28Г71 
6Sc'OP | 6202€ | 726:56 | £TS'0£ | 10672 | 68676 | 266712 | IOU'ST | ФТЕОТ 
22686 | EEIE | 12926 | 91966 | 121796 | 8886 | 48805 | 81721 | СТ 
9976 | 0208 | OTF TE | 21-82 | 80052 | SLL'ZG | 28861 | 09291 | SZ€TT 
161-08 | 45966 | PPL'OE | 205-22 | 006-86 68915 | 86681 | 26 91| 9IL'ET 
<0876 | 9ғ6:26 | 69882 | 686:52 | 09222 | 10900 | SESAT | ОРҒФТ | 26801 
GOES | 96606 | 28922 | 69/7 | 619112 | 11961 | 8697 | TESST | 200201 
000:26 | 89'65 | 96292 | crc | 997'05 | SIF'ST | SEEST | PZ9'ZI | STII 
6/06 | 6ec'Sc | 9662 | 20822 | ПЕІ | 2261 | GEEFT | IZL'TT | 20601 
191-65 | 67892 | 689-6 | V90 Ic | 11:81 | 62291 | 668'8Т | 128501 | 2976 
889`/б | 223726 | 29:22 | 61861 | 68691 | 611761 | OF&'GT |9666 | FE9's 
71506 | %%0%2 | 920112 | GFS'ST | 218221 | TIO'PT | ОРЕТІ | 80'6 | 2084 
с/:72 | 81926 | 97061 | 92271 | 18931 | 66621 | I£'OL|SPI'S | 6869 
5-85 | I9L'Ic | 70681 | 286707 | ©РР'ЕТ | ТӨ/ЛТ | GEG |27902. | 62179 

090-12 | 62961 | 61691 | #89-91 ЕЕ 90001 | РЕЗ |8669 | 0889 
060'0z | SITSI | 20621 | 508"8Т | 060111 | 256 | #2 |2699 | +68 
GL¥'SI | 22991 | Z00 VT | 21021 | 6086 | 6868 |9769 |129} | cose 
б18`91 | ESO'ST | 26601 | SOOT | 8528 | 1602 |8969 |8088 | 0705 
98091 | SSE'ET | 070711 | 9806 | 6802. | %909 |196% |0008 | grez 
221 | 89911 | 886 | 6472 |6866 | SASF |2668 10616 | OTOT 
СЕБЕТІ |2686 |91872 |1669 | сор |006 |9060 |YGPI | 9001 
0126 | #8: |1669 | Soo |6108 | 80Ғ2 | 9881 |8120 | ОРО 
ZIPS | 1988 SPIT | PLOT |420 |8РГО |29900 


90/76 


6699 


070 00 


1070 2070 


( 


070 
"ыәцецапа jo uorssrudod Aq “рХов 29 140 '849340 М 2409824 40f роц рү 10217912078 ѕләце ' 
eX JO son[ua ou) 


0/0 0870 


090 


0870 


21433 eu jo Apoq oy} ur pojurid әле 
auvaog-IH;) чо TIVI 


се УТУ 


666`06 
892`61 
666751 
ТІГІ 
c66 41 
52% 91 
669791 
ВРЕ ТІ 
IFO'TI 
ОғС "ЕТ 
ӨРЕСІ 
199711 
500801 
980701 
SI&' 6 
2798 
06772 
GFO’ 
705`9 
8488 
[98°F 
S9rr 
067'$ 
$585 
?0с'с 
01971 
790`Т 
78$`0 
IIc'O 
891070 


0670 


5681 
80271 
866 9I 
IST'OI 
6LE°ST 
ПОТ 
SFS'EI 
160781 
SEE SI 
T6S°TT 


TSS'OT 
LIT OL 
06876 
с298 
c96'Z 
1962 
1/49 
G68'6C 
9co' € 
5/6% 
OF6'E 
35$'8 
56276 
LOT'S 
08071 
SPIT 
T1470 
cS£'0 
50170 
5680070 


2670 


90591 €S6'F1 | OF 
Ч! 9S FI | 65 
LES FI G9G'er | 85 
COL FI GSS | 26 
60F' ST 861°61 | 95 
269701 FES IT | SS 
660 II 98801 | TC 
£6c TI 967701 | $5 
009'0I cr9 6 | GS 
9166 2688 | Te 
2866 0928 | 05 
2908 ££9'4 | 6I 
9064 с102 | SI 
470) 8079 | ZI 
FI9'9 2182 | 9I 
9€86'9 655'$ | GI 
89g'e 099% | TI 
солт LOU? | ет 
SLUT 1498 | CI 
60978 €90'& | II 
690'g 89026 | OI 
SESS 8805 | 6 
2802 9791 |3 
Y9€'I бест |2 
тет 2280 |9 
с9770 PSSO | 9 
6ZF'0 2680 | Ф 
с8г'0 отто |€ 
РОРО`О 10200. |5 
82900070 |491000'0 | I 
86'0 66'0 =d | Јр 


Ч шош podupy 


TESTING EXPERIMENTAL HYPOTHESES 243 


members in the group, twenty-four marked F, twelve 7, and 
twelve U. Do these results indicate a significant trend of 
opinion? 


The observed data (fə) are given in the first row of Table 33. 
In the second row is the distribution of answers to be expected 
on the null hypothesis (f.), if each answer is selected equally 
often. Below the table are entered the differences (f, — Je). 
Each of these differenees is squared and divided by its f. 
(64/16 + 16/16 + 16/16) to give x? = 6. 


ABLE 33 


Answers 
Favorable Indifferent Unfavorable 


Observed (fo) 24 12 12 48 
Expected (fe) 16 16 16 48 


fo — fe 8 + 4 
(fo — fe)? 64 16 16 
(fo ке Je)? 4 1 1 


—— ——— 


fe 
ете z[4— | -6 4-2 Р = .05 (Table 32) 


The degrees of freedom in the table may be readily caleulated 
from the formula df = (r — 1)(c — 1) to be (3 — 1)(2 — 1) or 2. 
Also, the degrees of freedom may be found directly in the follow- 
ing way: Since we know the row totals to be 48, when two 
entries are made in а row the third is immediately fixed, is not 
"free," When the first two entries in row 1 are 24 and 12, for 
example, the third entry must be 12 to make up 48. Since we 
also know the sums of the columns, only one entry іп a column 
is free, the second being fixed as soon as the first is tabulated. 
There are, then, two degrees of freedom for rows and one degree 
of freedom for columns, and 2X 1 = 2 degrees of freedom for 
the table. 

Entering Table 32 we find in row df = 2, a x? of almost 6 
(actually, 5.991) in the column headed .05. A P of .05 means 


| 354 9 


Hi 


244 STATISTICS IN PSYCHOLOGY AND EDUCATION 


that should we repeat this experiment, only once in twenty 
trials would a x? of 6 (or more) be expected to occur if the null 
hypothesis were true. Our result may be marked "significant 
at the .05 level," therefore, on the grounds that the divergence 
of observed from expected results is much too large to be at- 
tributed solely to sampling fluctuations. We reject the “equal 
answer” hypothesis and conclude that our group really favors 
the proposition. In general, we may safely discard a null hy- 
pothesis whenever P is .05 or less. 


Example (2) Тһе items in an attitude scale are answered 
by underlining one of the following phrases: Strongly ар- 
prove, approve, indifferent, disapprove, strongly disapprove. 
Тһе distribution of answers to an item marked by 100 sub- 
jects is shown in Table 34. Do these answers diverge signifi- 
cantly from the distribution to be expected if there are no 
preferences in the group? 


TABLE 34 
Strongly Indiffer- — Dis Strongly 
pub Approve c prove, Disap- 
prove 
Observed (fo) 23 18 24 17 18 100 
Expected (fe) 20 20 20 20 20 100 
(fo — fe) 3 2 1 3 2 
(fo — f? 9 4 16 9 4 
(f, = f .45 .20 .80 .45 .20 
n 
? = 2.10 df = 4 P lies between -70 and .80 
On the null hypothesis of “equal probability” twenty subjects 


may be expected to select each of the five possible answers. 
Squaring the (f, — f), dividing by the expected result (f,), and 
summing, we obtain a x? of 2.10, df = (5 — 1)(2 — 1) or 4. 
From Table 32, reading across from row 
of 2.195 in column .70. This x? is nearest to our calculated 
value of 2.10, which lies between the entries in columns .70 
and .80. It is sufficiently accurate to describe P as lying be- 


df = 4, we locate a x? 


e, 


TESTING EXPERIMENTAL HYPOTHESES 245 


tween .70 and .80 without interpolation. Since this much 
divergence from the null hypothesis, namely, 2.10 can be ex- 
pected to occur upon repetition of the experiment in approxi- 
mately 75% of the trials, x? is cleatly not significant and we 
must retain the null hypothesis. There is no conclusive evi- 
dence of either a favorable or unfavorable attitude toward this 
item. 


2. Testing Divergence of Observed Values from Values Cal- 
culated on the Hypothesis of a Normal Distribution 
Our hypothesis may assert that the frequencies of an event 
which we have observed really follow the normal distribution 
instead of being equally probable. An example illustrates how 
this hypothesis may be tested by chi-square. 

Example (3) Forty-two salesmen have been classified 
into five groups — excellent, very good, satisfactory, poor, and 
very poor — by a consensus of sales managers. Does this dis- 
tribution of ratings differ significantly from that to be expected 
if selling ability is normally distributed? 


TABLE 35 


Excellent Very Good и Poor Very poor 


Observed (/) 6 10 | 20 4 2 42 
Expected (у) 1.8 | 10 | 19 10 1.5 42 
(fo — fe) 4.5 0 1 6 5 
(fo — fa)? 2035 0 1 36 25 
fo f) 18.50 0 05 3.60 17 
< x? = 17.32 df=4 P is less than .01 


The entries in row 1 give the number of men classified in 
each of the five categories. In гозу 2, the entries show how many 
of the forty-two salesmen may be expected to fall in each cate- 
gory on the hypothesis of a normal distribution. These last 
entries were found by dividing the baseline of a normal curve 
(taken to extend over бе) into five equal segments of 1.20 each. 


246 STATISTICS IN PSYCHOLOGY AND EDUCATION 


From Table 17, the proportions of the normal distribution to be 
found in each of these segments are as follows: 


Proportion 
Between + 3.000 and 1.80с .035 
d 1.80с and .60т .24 
E. .60т апа — .60с 45 
— .60¢ and - 1.800 24 
- 1.80с and — 3.000 035 


These proportions taken as percentages of forty-two have been 
calculated and are entered in Table 35. The x? in the table is 
17.32 and df = (5 — 1)(2— 1) or 4. From Table 32 it is clear 
that this value of x? lies beyond the limits of the table, hence P 
is listed simply as less than .01. The discrepancy between ob- 
served and expected values is so great that the hypothesis of a 
normal distribution of selling ability must be rejected, Too 
many men have been described as excellent, and too few as 
poor and very poor, to make for agreement with our hypothesis, 


3. The Chi-Square Test When Table Entries Are Small 


When table entries are large, estimates of probability given 
by the x?-test are usually quite close to those obtained by direct 
methods. But when table entries are small (say five or less), and 
especially when the table is 2 x 2 fold (when the number of 
degrees of freedom is 1) the chi-square test is subject to con- 
siderable error. It is customary in such cases to make a correc- 
tion — called the correction for continuity.* Reasons for 


making this correction will be best understood from the 
examples following. 


Example (4) In Example (1), 
Seven correct, judgments in ten trials. "The probability of a 
right judgment was 1/2 in each instance, so that the 
expected number of correct judgments was five, Test, our 


page 234, an observer gave 


* Goulden, C. H., Methods of Statistical Analysis (1 -110. 
Snedecor, G. W., Statistical Methods (3rd ed., 1940), (0999), рр. 101 ja 


*N 


TESTING EXPERIMENTAL HYPOTHESES 247 


subject’s deviation from the null hypothesis by computing 
chi-square and compare the P with that found by direct calcu- 


lation. 
TABLE 36 
Right Wrong 

Observed (fo) y 3 10 
Expected (/) 5 5 10 

(fo — fe) 2 2 
Correction (— .5) 1.5 1.5 

(fo — fe)? 2.25 2.25 

(fo — fe)? 45 45 


356 (by interpolation in Table 32) 
3P = 178 


Caleulations in Table 36 follow those of previous tables ex- 
cept for the correction which consists in subtracting .5 from each 
(f, — f.) difference. In applying the x?-test we assume that 
adjacent f requencies are connected by а continuous and smooth 
Curve (like the normal curve) and are not discrete numbers. 
In 2 x 2 fold tables, however, in which the entries are small the 
Curve is not continuous. Hence, the deviation of 7 from 5 must 
be written as 1.5 (0.5 — 5) instead of 2 (7 — 5), since 6.5 is the 
lower limit of 7 in a continuous series. In like manner the 
deviation of 3 from 5 must be taken from the upper limit of 3, 
namely, 3.5 (see Fig. 46). Still another change in procedure 
must be made in order to have the probability obtained from x? 
agree with the direct determination of probability. Р in the 
X? table gives the probability of 7 or more right answers and of 
3 or less right answers, i.e., 16 takes account of both ends of the 
Probability curve. We must take 1/2 of P, therefore, if we 
Want only the probability of 7 or more right answers. Note that 
the P/2 of .178 is very close to the P of .172 got by the direct 
method on page 235. If we repeated our test we should expect 
& score of 7 or better about seventeen times іп 100 trials. It is 


248 STATISTICS IN PSYCHOLOGY AND EDUCATION 


clear, therefore, that the obtained score is not significant and 
does not refute the null hypothesis. 

It should be noted that had we omitted the correction for 
continuity, chi-square would have been 1.60 and P/2 (by inter- 
polation in Table 32) .095. It is clear that failure to use the 
correction causes the probability to be greatly underestimated 
and the significance of our result considerably increased. 

When the expected entries in a 2 x 2 fold table are the same 


(as in Tables 96, 37) the formula for chi-square may be written 
in à somewhat shorter form as follows: 


2(fo — f. (43) 

fe 
(short formula for Хх in2x 2 fold tables when expected 
frequencies are equal) 

Applying formula (43) to Table 36 we get а chi-squ 

2(1.5)? 

“әле 

Example (5) In Example (3), page 2 

à score of sixty right on a test of 100 tr 
the chi-square test, determine w 
merely guessing. Compare your г 
page 238 when the normal curve h 


a 


X 


are of 


38, a subject achieved 
ue-false items. From 
hether this subject was 
esult with that, found on 
ypothesis was employed. 


TABLE 37 
Right Wrong 


Observed (fo) 60 40 100 
Expected (7,) 50 50 100 
Teo iim 
(7 са Ге) 10 10 
Correction (— 5) 9.5 9.5 
(f. — fe)? 90.25 90.25 
(fo 217 1.81 1.81 
x? = 3.65 Р = .059 
df = 1 ФР = .0295 or .03 


Although the cell entries 


іп Table 37 аге large, use of the 
correction for continuity will 


be found to yield a result in some- 


TESTING EXPERIMENTAL HYPOTHESES 249 


what closer agreement with that found on page 238 than can 
be obtained without the correetion. Asshown in F igure 46, the 
probability of a deviation of 60 or more from 50 is that part of 
the eurve lying above 59.5. In Table 37, the P of .059 gives us 
the probability of a score of 60 or more and of 40 or less. Hence 
we must take 1/2 of P (i.e., .0295) to give us the probability of a 
Score of 60 or more. Agreement between the probability given 
by the x?-test, and by direct calculation (p. 238) is very close. 
Note that when X^ is calculated without the correction, we get 
а P/2 of .024, a slight underestimation. In general, the correc- 
tion for continuity has little effect when table entries are large 
(as here). But failure to use the correction even when numbers 
are fairly large may lead to some underestimation of the 
probability; hence it is generally wise to use it. 
Example (6) In Example (4), page 239, given a multiple- 
choice test of sixty items (four possible answers to each item) 
we were required to find what score a subject must achieve in 
order to demonstrate knowledge of the test material. Ву 
use of the normal probability distribution, 16 was shown that 
à score of 21 is reasonably significant and a score of 23 
highly significant. Can these results be verified by the chi- 
square test? 
In Table 38 an obtained score of 21 is tested against an ex- 


pected score of 15. In the first line of the table the observed 


TABLE 38 


R W 
fo 21 39 60 
fe 15 15 60 
(f. т fe) 6 6 
Correction (— .5) 5.5 5.5 
(fo — f? 30.25 30.25 
(fo — fe)? 2.02 67 
fe 
x? = 2.69 Р = 10 
аў = 1 АР = ‚05 


250 STATISTICS IN PSYCHOLOGY AND EDUCATION 


values (fə) are 21 right and 39 wrong; in the second line, the 
expected or "guess" values are 15 right and 45 wrong. Mak- 
ing the correction for continuity we obtain a x? of 2.69, a P of 
.10 and 1/2 P of .05. Only once in twenty trials would we ex- 
pect a score of 21 or higher to occur if the subject were merely 
guessing, had no knowledge of the test material. This answer 
checks the result obtained on page 240. 

In Table 39 a score of 23 is tested against the expected score 
of 15. Making the correction for continuity, we obtain a x? of 
9.00 which yields a P of .0275 and 1 /2 P of .0138. Again this 


result closely checks the answer obtained on page 240 by use 
of the normal probability curve. 


TABLE 39 

R wW 
„| 23 | 37 | во 
fel 15 | 45 | 60 


fo — fe 8 8 
Correction (— .5) 75 75 


(Jo — Л 96.25 56.25 
(fo — fe)? 3.75 1.25 


X2 = 5.00 P = 0275 
ur ud $P = ‚0138 or .01 


4. The X*-Test When Table Entries Are in Percentages 

ith percentage entries 
s de. This follows from 
bility the Significance of an 


is the same in both cases. 
as percentages, we have 


TESTING EXPERIMENTAL HYPOTHESES 251 


R W 
„| 70% | 30% | 100% 
| 50% | 50% | 100% 
(Ь-/)) 20% 20% 
Correction* (— 5%) 15% 15% 
(fo ims fJ 225 225 
2(225) 
26, = == Ь 
xi; 5 9 у (43) 
ча 10 _ 
№ = 9 х туу = 90 (Table 30) 


It is clear that in order to bring x? to its proper value in terms 
of original numbers we must multiply the “percent” x? by 
10/100 to give .90. А x? caleulated from percentages must al- 
ways be multiplied by №/100 (N = number of observations) in 
order to adjust it to the actual frequencies in the given sample. 


5. The X?-Test of Independence in Contingency Tables 

We have seen that x? may be employed to test the agreement 
between observed results and those expected on some hypothesis. 
А further useful application of chi-square can be made when we 
Wish to investigate the relationship between traits or attributes 
Which can be classified into two or more categories. The same 
persons, for example, may be classified as to hair color (light, 
brown, black, red) and as to eye color (blue, gray, brown), 
апа the correspondence in these attributes noted. Ог fathers 
ànd sons may be classified with respect to interests or tempera- 
ment or achievement and the relationship of the attributes in 
the two groups studied. 

Table 40 is а contingency table, i.e., а double entry or two- 
way table in which the possession by a group of varying degrees 
of two characteristics is represented. In the tabulation in Table 
40, 413 persons have been classified as to "eyedness" and 
“handedness.” Eyedness, or eye dominance, is described as 


* The unit here is 10%, so that 5% must be subtracted from each 
(fo — f.) difference. Thus (70% — 50%) is actually (65% — 50%), and 


(80% — 50%) is (35% — 50%). See page 247. 


252 STATISTICS IN PSYCHOLOGY AND EDUCATION 


left-eyed, ambiocular, or right-eyed; handedness as left- 
handed, ambidextrous, or right-handed. Reading down the 
first column we find that of 118 left-eyed persons, 34 are left- 
handed, 27 ambidextrous and 57 right-handed. Across the 
first row we find 124 left-handed persons, of whom 34 are left- 
eyed, 62 ambiocular and 28 right-eyed. The other columns and 
rows are interpreted in the same way. 


TABLE 40 


Comparison or EYEDNESS AND HANDEDNESS 
IN 413 Ренвохв“ 


Left-Eyed Ambiocular Right-Eyed "Totals 


(35.4) (58.5) (30.0) 
Left-handed Г 29 | m 20 124 
Ambidextrous 219 (824) з 75 
" 5 31. 101. 51. 
Right-handed (oro goro) (215) 214 
"Totals 118 195 100 413 
I. Caleulation of independence values (5% 
118х124_,. 195 x 124 100 x 1 
ay = 354 Tig = 585 Е = 300 
118 Х 75 _ 195 x 75 Е 100 5 
HXT na 5 = 354 105 75 _ 182 
118x214 _ 195 x 214 1 
ШЫН = 611 Fg = 1010 10X2 51g 
TI. Calculation of x?: 
(— 1.4)? + 35.4 = .055 (8.5)? + 58.5 = -209 (— 2.0)? = 30 = .133 
Pu - 214 -1465 (— 7.4)? + 354 = 1.547 Ta 182 - 178 
— 4.1) + 61.1 = .275 (4.0): + 1010 = 158 (.20)2 + 51 = .001 
?—402 df-4 сам TM 


P lies between .30 and :50 


* From Woo, Т. L., Biometrika (1936), 20A, pp. 79-118. 


The hypothesis to be tested is the 
that handedness and eyedness are ess 


pendent. In order to compute X? we must first calculate an 
‘independence value” 


for each cell in the contingency table. 
Independence values are represented by figures in parentheses 


null hypothesis, namely, 
entially unrelated or inde- 


wv 


TESTING EXPERIMENTAL HYPOTHESES 253 


within the different cells; they give the number of people whom 
we should expect to find possessing the designated eyedness and 
handedness combinations in the absence of any real associa- 
tion. "The method of calculating independence values is shown 
in Table 40. To illustrate. with the first entry, there are 118 
left-eyed and 124 left-handed persons. If there were no as- 
sociation between left-eyedness and left-handedness we should 
expect to find, by chance, 18x124 
group who are left-eyed and left-handed. The reason for this 
may readily be seen. We know that 118/413 of the entire group 
is left-eyed. This proportion of left-eyed individuals should 
hold for any sub-group, if there is по dependence of eyedness on 
handedness. Hence, 118/413 or 28.5% of the 124 left-handed 
individuals, i.e., 35.4, should also be left-eyed. Independence 
values for all cells are shown in Table 40. i 

When the expected or independence values have been com- 
puted, we find the difference between the observed and expected 
values for each cell, square each difference and divide in each 
instance by the independence value. The sum of these quotients 
by formula (42) gives x’. In the present problem x? — 4.02 
and df = (3 — 1)(3 — 1) or 4. From Table 32 we find that P lies 
between .30 and .50 and hence x? is not significant. Тһе ob- 


Served results are close to those to be expected on the hypothesis 
ere is no evidence of any real association 


or 35.4 individuals in our 


of independence and th 
between eyedness and handedness within our group. 


ПІ. Tae ANALYSIS OF VARIANCE 


Analysis of variance represents still another means of testing 
the null hypothesis. The term * analysis of variance" includes 
(a) a variety of experimental designs or arrangements, as well 
as (b) certain statistical techniques appropriate for use with 
these designs. Тһе statistical methods employed in analysis of 
variance are not new (as they are often thought to be), but are, 
in reality, adaptations of methods described earlier in this book. 
'The experimental designs, on the other hand, are in many 


254 STATISTICS IN PSYCHOLOGY AND EDUCATION 


instances new — at least to psychology. These systematic 
procedures will often provide a more efficient test of the null 
hypothesis than methods now eustomarily used. 

In the following sections certain elementary applications of 
analysis of variance to experimental psychology will be shown 
by means of a problem which illustrates the simplest design. 
Itis hoped that by working through this problem the reader will 
become acquainted with the mechanics of analysis of variance, 
as well as with some of its possibilities. Tor further and more 
comprehensive treatments of this topie the reader should con- 


sult the books listed below.* Only a brief outline is attempted 
here. 


1. How Variances Can Be Analyzed 
The variability within a set of scores (N large) may be 


measured by the standard deviation ( / 22), but it may also be 


. . Tas 

expressed in terms of the “variance” or g? Т * A decided ad- 
vantage of variances over SD's is that variances are oftentimes 
additive — and the sums of Squares upon which variances аге 
based always are. Азап example, suppose we add the two inde- 
pendent scores X and Y to get the composite score 2, Express- 


Ing 2, y, and z as deviations from their means, М z, My, and M., 
we may write 


Z= g Fy 
and squaring and summing, Dz? = Dr? + Bye, 
drops out since there is no correlation between z and y — хапа y 
are independent by hypothesis.) Dividing by N, we have 

ж oot G. W., Statistical Methods (1946), 


(The term in ay 


15, and Chapters 10, 11, 12, 13, 
ig eae C. H., Methods of Statistical Analysis (1939), Chapters 5, 11, 


ашқ ПЕ із Statistical Analysis in Educational Research (1940), 
Fisher, В. A., The Design of Experiments (1935). 


Fisher, R. A., Statistical Methods for Res h : 
(The F'isher references will be difficult for ‘the M a ES 


TESTING EXPERIMENTAL HYPOTHESES 255 
07, = 0°. + 0% 
and с. = V+ Oy 


'The first equation in terms of variances is more convenient 
for analysis than is the equation in terms of standard deviations. 
И we divide through by ø, for example, we find that 


о? оз, 3 А 
l=- + 58 from which we аге able to determine what 


proportion of the total variance (0°.) is attributable to the vari- 
ance of X and what proportion is attributable to the variance 
of Y. Analysis into proportional contributions cannot be made 
with standard deviations. 

Тһе technique of variance analysis is illustrated by the data 
in Table 41. From a large group of fifth-grade boys, four boys 
are given a test under condition A, four under condition B, and 
four under condition C. Subjects are assigned at random to 
each of the three groups. Do the mean scores achieved under 
conditions, A, B, and C differ significantly? 

We may begin with the null hypothesis, namely, that the 
three different conditions do not really influence the final scores 
and that variations in the performance of the three groups are 
no greater than might be expected by chance. To test this 
hypothesis we may compare the variation attributable to the 
different methods with the variation to be expected in a group 
of boys all of whom have taken the test under the same method. 
Тһе variation exhibited by all twelve boys is to be divided, 
then, into two portions: (1) the variance attributable to methods 
(the between-methods effect), and (2) the variance attributable 
to subjects (the within-groups effect), and these two variances 
аге to be compared. "Тһе procedure is outlined in the follow- 
ing steps which parallel the calculations in Table 41: 

Step 1 

'The total variation is obtained first by summing up the 
Squares of the deviations from the mean of all twelve boys. 
Тһе general mean is 12; and the sum of the squares of devia- 


tions around this general mean (GM) is 178. 


256 STATISTICS IN PSYCHOLOGY AND EDUCATION 


TABLE 41 


To ILLUSTRATE THE Use or THE Ratio “VARIANCE AMONG Метнорз” 
TO “VARIANCE WITHIN GROUPS" IN DETERMINING THE SIGNIFICANCE OF 
DIFFERENCES AMONG Means. THREE GROUPS оғ Four Бовувств Елсн 
Work ву Metuops A, B, AND С. Tur Dara ARE ARTIFICIAL. 


Methods 
A B8 © 
UJ. 0 Genel Meer eax DEBE, 
12 17 6 
8 в 2B 
а 0 p 
Ms= 11 15 10 


Steps: 


1. Sum of squares of deviations of the scores in A, B, and C around GM 
CI K+ SLOP H+ Se ete eae oe wae = 178 


2. Sum of squares of deviations of M's of A, B, and C around GM of 
12 = (11— 12)? + (15 — 12)? + (10 — 12} = 12+ 032 4 22 = 14 


3. (а) Forlist А: Sumof Squares of deviations around 
Moll = 2+ 3+ 11-3 = 20 
(b) Forlist B: Sum of Squares of deviations around 
M of 15 = 0 4 R424 7 — 78 
(c) For list C: Sum of squares of deviati д) 
М of 10 =0 4 24 42 о _ 24 
- 133 | 


6. Variance (between methods) = 58 — 28. 
Variance (within groups) = 182 = 13.56. 
T-test: p — Variance (between) 28 
F-test: Р = variance (within) = 13.56 = 2.07. 


The F of 207 is smaller than 4.26 and is not significant at the 05 level > 


Step 2 


Obtained in the following 


; of method B, 15; and of 
method C, 10. Тһе sum of the squares of the deviations of 


these three means (11, 15, and 10) around the GM of 12 is 14. 


ey 


or 
= 


TESTING EXPERIMENTAL HYPOTHESES 2 


Step 8 

The variance attributable to subjects, sometimes called the 
residual variance, is found by adding the sums of squares within 
columns. The sum of squares of deviations of the four scores in 
À around their mean of 11 is 20 ; the sum of squares of devia- 
tions of the four scores in B around their mean of 15 is 78 ; and 
the sum of squares of deviations of the four scores in C around 
their mean of 10 is 24. Adding these, we get 122 as the sum of 
Squares of deviations within the columns A, B, С. Тһе sum of 
Squares in each column is around its own mean. Hence the 
final sum gives the variation attributable to subjects, and is 
independent of systematic differences from column to column. 
Step 4 

Writing the sums of squares in the form of an equation, we 
have that 178 = 1224-4 х 14, or sum of squares around 
СМ = sum of squares within methods +n (ie., 4) X sum of 
Squares between the M's of methods. Тһе sum of squares around 
а GM can always be broken down (as here) into component 


Sums of squares. 


2. Degrees of Freedom . 

Each of these sums of squares becomes а variance when di- 
vided by the appropriate number of degrees of freedom. 
Step 5 

Since there are 12 scores in all (А + B + С), the divisor for 
178 (sum of squares of deviations around GM) is (№ — 1) or 11 
degrees of freedom. The divisor for 122 (sum of squares of 
deviations around the group means) is 9 degrees of freedom, as 
there are (n — 1) or 3 degrees of freedom in each list and 3 x 3 
or 9 degrees of freedom in the three lists. This leaves 2 degrees 
of freedom as the divisor for 4 X 14 (sum of squares of devia- 
tions among ЛГ ’s). Expressing the degrees of freedom as an 
equation, we have 11 = 9 + 2, or degrees of freedom for sum of 
Sguares of deviations around GM = degrees of freedom within 
Stoups plus degrees of freedom for sum of squares of deviations 


among 1/75 of methods. 


258 STATISTICS IN PSYCHOLOGY AND EDUCATION 


3. Measuring Significance by Means of the Ratio of “Ве- 
tween” to '*Within" Variance 
Step 6 
Dividing 122 by 9, we get 13.56 as the variance within our 
three groups; and dividing 56 by 2, 28 as the variance among 
the means of our three methods. In this problem the null hy- 
pothesis asserts that the three sets of scores A, B, and C are 
random samples drawn from the same parent population and 
that their M's differ only through sampling accidents. This 
hypothesis may be tested by computing the ratio “F.” 
г _ between M’s variance, or in eur problemi .28 . 2.07 

— within group variance? р ' 13.56 0: 
The significance of Ё depends upon the degrees of freedom in the 
numerator and in the denominator of the fraction which de- 
termines F. From tables of Ё,* we find that when the numera- 
tor has 2 degrees of freedom and the denominator 9, F must 
equal 4.26 to be significant at the .05 level of confidence and 
8.02 to be significant at the .01 level of confidence. 

Our F falls far below the .05 level, hence there is no assurance 
of any actual differences among our method means. We retain 
the null hypothesis, since on the present evidence there is no 
reason to believe our groups to be other than random samples 
drawn from the same population. 

In the next section, another 
Table 41 is given to illustrate the 
analysis of variance. 
but fundamental expe 


problem similar to that of 

procedure usually followed in 
The data in Table 42 constitute a simple 
rimental design which is often useful. 


4. An Illustration of Simple Analysis of Variance When There 
Is One Criterion of Classification 


Example (1) А sensory-motor learning test is 
to groups of subjects under five conditions or methods, de- 
signated, respectively, А, B, C, D, and Е, Five subjects are 

* For F-t: 


FM. Жары see Snedecor, op. cit., рр. 184-187; or Lindquist, 0P- 


administered 


TESTING EXPERIMENTAL HYPOTHESES 259 


assigned at random to each group. Do the mean scores 
achieved under the five methods differ significantly? 


Records for each of the five groups are shown in parallel 
columns in Table 42. Individual scores are listed under the 
five headings which designate the conditions under which the 
learning test was administered. Since “methods” furnishes 
the only categories, there is said to be one criterion of classifica- 
tion. The first object of our analysis is a breakdown of the 
total variance (o?) of the twenty-five scores into two parts: 
(1) the variance attributable to methods, and (2) the variance 
attributable to individual differences, i.e., within the several 
groups. Computation of the sums of squares upon which these 
variances are based is shown in Table 42A. А more detailed 
account of these calculations may be set forth as follows: 


Step 1 
Caleulation of the “correction term.” When the SD is 


È а oe eu 
calculated from original measures,* the formula 0? = 70- — c 


becomes dm E — М. The correction equals the mean (M) 


; * Уа? 
directly since AM = 0. Replacing o° by ^y^: we have that 


2 өзі i 
Е = А _ М. И the correction term М? is written 
VER this equation through by № to find 


№ we may multiply 


that Dg? УХ? GN. In Table 42 the correction term 
1 


(2x) (1135)? 
Uds 529.0. 
м в 25 ог 51, 


Step 2 


Since X3? = ХХ? — 
Original scores and then subtract the correction term (51,529), 
nalysis of. variance to caleulate 


(ZX) we must square and sum the 


* See page 62. It is customary ues 
Variances from original measures or Scores. 


260 STATISTICS IN PSYCHOLOGY AND EDUCATION 


TABLE 42 
Scores Маре вх Five Grovers or STUDENTS on А LEARNING Test 
Each group consists of five individuals and each group takes the test by 


a different method. To illustrate analysis of variance when there is one 
criterion of classification: 


Methods 
A B с D E 
35 38 34 55 71 
26 50 26 65 59 
29 50 59 56 43 
37 36 23 71 63 
40 40 60 35 34 
Sums = 167 214 202 282 270 1135 
М? = 33.4 42.8 40.4 56.4 54.0 СМ = 45.4 


А. Calculation of Sums of Squares (Computation from Original Measure) 


1135) 1988295 “ 
Step 1. Correction term (С) = (агу = 2885 re 51529.0 


Step 2. Total sum of Squares 
. = @5 + 26 + 24.. зу о 
= 56641 — 51529 = 5112 
Step 3. Sum of Squares among means of methods A, В, ©, D, and E 
= (167)? + (214)2 + (202)? + (282): + (270) 


9 
53382.6 — 51529.0 = 1853.6* 
Step 4. Sum of Squares within me 


thods = 5112 — 1853.6 = 3258.4 
В. Analysis of Variance 


Source df Sum of Squares Mean Sq. (Variance) 80 
Among the means 
of methods 4 1853.6 463.4 
Within methods 20 3258.4 162.9 12.8 
Total 24 5112.0 


From Table (For 4/20 df) 

po 163.4 _ Р at .05 = 2.97 

Ж 162.9 = 281 F at .01 = 443 
in order to find the sum of Squares around the mean of all 
twenty-five scores, In Table 42,5 
ming, we get a total of 56,641; 
the final result is 5,112. This s 
puted from the deviations aro 
mean is 45.4; subtracting 45.4 


4} 


TESTING EXPERIMENTAL HYPOTHESES 261 


» Squaring these deviations and summing, we get 5112, which 
checks the above. 


Step 8 

'lTo find the sum of squares attributable to methods we 
Square the sum of each column, add these values, and divide 
the total by five (the number of individuals in each column). 
If now we subtract the correction found in Step 1, the resulting 
Sum of squares is 1853.6. Аз we are still working with original 
measures, the method of caleulation here repeats Step 1, except 
that we must divide the sum of squares for column totals by 
the number of scores in each column. 


Slep 4 

The sum of squares within columns (individual variation) 
always equals the total sum of squares minus the sum of squares 
among the means of columns. Our within columns sum is 
found by subtracting 1853.6 from 5112 to give 3258.4. It may 
also be calculated directly from the data.” 

Calculation of the variances from the three sums of squares, 
and the analysis of the total variance in terms of its two com- 
Ponents is shown in Table 42 B. Each sum of squares must 
be divided by the number of degrees of freedom allotted to 
it in order to give the mean square or variance shown in the 
fourth column under *B." There are twenty-five scores in all 
in Table 42 and (N — 1) or 24 degrees of freedom. The degrees 
of freedom for methods are listed as (5 — 1) or 4, less by 1 than 
the number of methods; and the degrees of freedom within 
columns are (24 — 4) or 20. This last df may be calculated 
direetly in the following way: there are (5 — 1) or 4 degrees of 


freedom in each column; and 4X5 (number of methods or 


columns) gives us 20 degrees of freedom for within groups. 
The significance of the differences among the means of our 

five methods can be determined by dividing methods variance 

by within groups variance to give the ratio called F. From 


* For an illustration, see Goulden, ор. cit, Example 29, рр. 125-127. 


262 STATISTICS IN PSYCHOLOGY AND EDUCATION 


tables of Р * we find that an F of 2.87 f represents the ratio 
which, under the conditions of our problem, is significant at the 
:05 level; and an Р of 4.43 ї represents the ratio which is signif- 
icant at the .01 level. Since our calculated F of 2.84 is almost 
equal to 2.87, we may regard the null hypothesis as barely 
disproved at the .05 level of confidence. According to the evi- 
dence, therefore, the five methods differ significantly. 

Е furnishes a comprehensive or over-all test of significance. 
A significant F does not tell us which method is best but simply 
that one or more differences as between method-means must 
be significant. If F is not significant there is no point in going 
further as no mean difference can be significant. But if F is 
significant we may proceed to calculate CR’s for the differences 
between column means by the following method. 

Means for the five methods are given in Table 
from 33.4 to 56.4. The best estimate of experi 
vidual variability is given by the SD computed from the within 
groups variance in “В” of Table 42. This SD is based upon 
all of our data and gives the variability in the table after the 
systematic effect, of methods-differences has been removed. 
(Note analogy here to “partial” 7, p. 417.) Hence, it is used 
instead of the SD's calculated from the separate columns, A, B, 
C, D, and E. 'The SE of any mean (SEx) will be BD 
12.8 % 


E — 5.7; and the SE of àny mean difference is (SEp) 


VSP u, + Вг, or uea + Lsp -84 


42; they vary 
mental or indi- 


or 


Ces against it. From 

f freedom a t of 2.09 is 

* For F-tables see Snedecor, op. cit., pp. 184-1 
pp. 62-65. 


87; or Lindquist, op. cit., 
T For 4/20 degrees of freedom. 


41 


е 


TESTING EXPERIMENTAL HYPOTHESES 263 


significant at the .05 level. Hence, since D = t X SEp, we find, 
upon substituting 2.09 for апа 8.1 for SE, that a difference of 
16.9 is significant at the .05 confidence level. Table 43 below 
gives D's between pairs of 2Гв, and the significance of these 
differences. 


TABLE 43 
Methods Differences Ea cie 
A-B = 94 Es 
A-C = 70 no 
A-D — 23.0 yes 
A-E — 20.6 yes 
B-C 24 no 
B-D — 13.6 no 
B-E —112 по 
ср — 16.0 по (?) 
CE — 13.6 no 
D-E 24 no 


Both methods D and E are significantly better than A and con- 
siderably better than B and С. But methods D and E are not 
distinguishably different. 

Several additional comments may serve to summarize the 
Steps in the solution of our problem in Table 42: 


(1) First, it must be remembered ihat we are testing the null 
hypothesis — the hypothesis that there are no differences 
among method-means. Stated in another way, we are test- 
ing the hypothesis that our five groups are in reality random 
samples drawn from the same normally distributed parent 
population. The F-test refutes the null hypothesis by demon- 
strating differences among means which would not arise more 
than once in twenty trials if the null hypothesis were true. 
Hence, F is significant at the .05 level of confidence; and our 
groups cannot be random samples from the same population. 
The /-test tells us which differences are significant. 

(2) Тһе 24 degrees of freedom (1 less than 25, the total number 
of scores) are broken down into 4 degrees of freedom allotted 


264 STATISTICS IN PSYCHOLOGY AND EDUCATION 


to the five methods and 20 degrees of freedom allotted to 


individual variations (within column variance). 


(3) According to the traditional method of treating a problem 
of this sort, standard deviations around the means of the five 
scores in each column are first computed. From these SD's 
standard errors of the means and standard errors of the 
differences among means are found. CR’s (or ts) are then 
caleulated for the differences between pairs of means and 
their significances determined from Table 29. Instead of 
following this procedure, we have computed in Table 42 & 
single SD based upon the variability within all five columns. 
Thisis a better estimate of the experimental variation within 
the table than could be found from the five separate SD's, 
each based upon five scores. Moreover, it represents vari- 
ability from which systematic method differences have been 
removed. Justifieation for pooling scores lies in our original 
assumption that under the null hypothesis the five groups 


are random samples from the same population. 
the F-test later disproves this hy 
ceed on it as our best assumptio: 
disproved, 


PROBLEMS 


tation is in random order. In ei 
times. Is this result significant? 
(a) Calculate P directly (p. 234), 
(b) Check P found in (a) by x 
found with and without correction for continuity. 
2. A multiple-choice test of fifty items 
item. How many items must a subject answer correctly 
(a) to reach the .05 confidence level? 
(b) to reach the .01 confidence level? 
3. A multiple- 


fore the chances are only one in fifty that he is merely guessing? 


To be sure, 
pothesis; but we may pro- 
n until it is — or is not — 


А Two sharp clicking sounds are presented in succession, the second 
being always more intense or less intense than the first. Presen- 


ght trials an observer is right six 


?-test (p. 246). Compare P В 


provides five responses to each 


| choice test of thirty items provides three responses for 
each item. How many items must a subject answer correctly be- 


М; 


ы 


e 


TESTING EXPERIMENTAL HYPOTHESES 265 


. A pack of fifty-two playing cards contains four suits (diamonds, 
clubs, spades, and hearts). A subject “guesses” through the pack 
of cards, naming only suits, and is right eighteen times. 

(a) Is this result better than “chance”? (Hint: In using the 
probability curve compute area to 17.5, lower limit of 18.0, 
rather than to 18.0.) 

(b) Check your answer by the хез (р. 246). 

. Twelve samples of handwriting, six from normal and six from 
insane adults, are presented to 2 graphologist who claims he can 
identify the writing of the insane. How many “insane” specimens 
must he recognize correctly in order to prove his contention? 

e classified into six categories taken 


- The following judgments wer 
to represent a continuum of opinion: 
Categories 
I II ш м у VI Total 
Judgments: 8 21 42 51 17 5 144 


(a) Test given distribution versus “equal probability” hypothesis. 
(0) Test given distribution versus normal distribution hypothesis. 
the following distribution of faces was 


7. In 120 throws of a single die, 
obtained: 
Faces 
қ " 3 4 5 6 "Total 
Observed 
frequencies: 20 25 18 ш ® * тн 
Dole равона constitute & refutation of the “equal probability” 
(null) hypothesis? 

8. The following table represents the number of boys and the number 
of girls who chose each of the five possible answers to an item in 
an attitude scale. 

| Ч Strongly 
н-д Approve Indifferent Disapprove Disapprove ы 
Воув 95 30 10 25 T 5 
Girls — 10 15 5 = " ү 


Do these data indicate а significant sex difference in attitude 
toward this question? (Note: Test the “independence (null) 


hypothesis.’’) 


266 
9. 


10. 


И, 


STATISTICS IN PSYCHOLOGY AND EDUCATION 


The table below shows the number of normals and abnormals who 
chose each of the three possible answers to an item on a neurotic 
questionnaire. 


Yes No ? Total 
Normals 14 66 10 90 
Abnormals 27 _66 B 100 

41 132 17 190 


Does this item differentiate between the two groups? Test the 
independence hypothesis, 


From the table below, determine Whether Item 27 differentiates 
between two groups of high and low general ability. 


Numbers of Two Groups Differing in Generat 
Ability Who Pass Item 27 in a Test 


Passed Failed Total 


High Ability 31 19 50 
Low Ability 24 26 50 
55 5 100 


made at different 
iffer Significantly? 
ance given on page 260, The 


times under the Same conditions, Do they d 


grees are, at the .05 level, 3.24; at 
the .01 level, 5.29, 
Set I Set II Set III Set IV 
16 19 17 14 
18 19 18 18 
20 20 14 12 
20 25 16 18 
17 25 12 16 
ANSWERS 


+ (a) P = .145 not significant 


(b) P — .145 When corrected ; .085. uncorrected 


e 


10. 
11. 


TESTING EXPERIMENTAL HYPOTHESES 267 


. Probability of 18 or better is .08; not significant 
. 5 or 6 (Probability of 5 or 6 = 37/924 = .04) 
‚ (а) x? = 72; P less than .01 and hypothesis of "equal prob- 


ability" must be discarded. 
(b) x? = 1124; P is less than .05, and the deviation from the 


normal hypothesis is significant. x 
Yes. x? = 12.90, df = 5, and P is between .02 and .05. 
No. x? = 7.08, df = 4, and P is between 20 and .10 
No. x? = 4.14, df = 2, and Р is between .20 and .10 
1.98, df — 1, and P lies between .20 and .10 


Yes. P = 6.55; significant at .01 level 


A 
о 
x 

% 
І 


CHAPTER IX 
LINEAR CORRELATION 


I. THE MEANING оғ CORRELATION 


1. Correlation as a Measure of Relationship 

In previous chapters we have been concerned with methods of 
computing statistical measures designed to represent in a re- 
liable way the performance of an individual or a group in some 
defined capacity or trait. Frequently, however, it is of more im- 
portance to examine the relationship of one ability to another 
than it is to measure performance in either trait alone. Are 
certain abilities closely related, and others relatively inde- 
pendent? Isit true that good pitch discrimination accompanies 
musical achievement; or that bright children tend to be less 
neurotic than average children? If we know the general intelli- 
gence of a child, as measured by a standard test, can we say 
anything about his probable scholastic achievement as repre- 
sented by grades? Problems like these and many others which 
involve the relations among abilities are studied by the method 
of correlation. 

When the relationship between two sets of' measures i$ 
“linear,” i.e., can be described by a straight line,” the correlation 
between the scores may be expressed by the “product-moment” 
coefficient of correlation. This coefficient is designated by the 
letter r. The method of calculating r will be outlined in Sec- 
tion ПІ. Before taking up the details of calculation, we shall 

9 ninke clear what correlation means, and how r meas- 
u elationship. 

Let us consider, first, a situation in which relationship is fixed 

and unchanging. The circumference of a circle is always 3.1416 


* See pages 309-311 for а further discussion of “linear” relationship. 
268 


n 


LINEAR CORRELATION 269 


. times its diameter (C — 3.1416D), and this equation holds no 


matter how large or how small the circle, or in what part of the 
world we find it. Each time the diameter of a circle is increased 
or decreased, the circumference is increased or decreased by just 
3.1416 times the same amount. In short, the dependence of 
circumference upon diameter is complete; hence, the correlation 
between the two dimensions is said to be perfect, and r — 1.00. 
In the same fashion, the relationship between two abilities, as 
represented by two sets of scores, may also be perfect. Sup- 
pose, for example, that a hundred students have exactly the 
ваше standing in two tests: — the student who ranks first in 
the one test ranks first in the other, the student who ranks 
Second in the first test ranks second in the other, and this one- 
to-one correspondence holds throughout the entire list. The 
relationship here is perfect since the relative position of each 
subject is exactly the same in one test as in the other. The 
coefficient of correlation is 1.00. 

Now let us consider the case in which there is no correlation 
present. Suppose that we have administered to one hundred 
college seniors the Army Alpha Examination and a simple 
“tapping test” in which the number of separate taps made in 
thirty seconds is recorded. Let the mean Alpha score for the 
whole group be 175, and the mean tapping rate be 185 taps in 
thirty seconds. Now suppose that when we divide our group 
into three sub-groups in accordance with the size of their Alpha 
scores, we find that the mean tapping rate of the superior or 
“high” group (whose mean Alpha score is 190) is 184 taps in 
thirty seconds; the mean tapping rate of the “middle” group 
(whose mean Alpha score is 175) is 186 taps in thirty seconds; 
and the mean tapping rate of the “low” group (whose mean 
Alpha score is 160) is 185 taps in thirty seconds. Since the 
tapping rate is almost identically the same for all thrée groups, 
it is clear that from a student’s tapping rate alone we should be 
unable to draw any conclusion as to his probable performance 
upon Alpha. A tapping rate of 185 is as likely to be found with 
an Army Alpha score of 150, as with one of 175 or even 200. 


270 STATISTICS IN PSYCHOLOGY AND EDUCATION 


In other words, there is no correspondence between the scores 
made by the members of our group upon the two tests, and 
hence, 7, the coefficient of correlation, is zero.* 

erfect relationship, then, is expressed by a coefficient of 1.00, 
and just no relationship by a coefficient of .00. Between these 
two limits, varying degrees of relation are indicated by such 
coefficients as .33, or .65, or .92. А coefficient of correlation 
falling between .00 and 1.00 always implies some degree of 
positive association, the degree of the association depending 
upon the size of the coefficient. 

Relationship may be negative as well as positive; that is, & 
high degree of one trait may be associated with a low degree of 
another. When negative or inverse relationship is perfect, 
т = — 1.00. То illustrate, suppose that in a small class of ten 
Schoolboys, the boy who stands first in Latin ranks lowest 
(tenth) in shop work; the boy who stands second in Latin ranks 
next to the bottom (ninth) in shop work; and that each boy 
stands just as far from the top of the list in Latin as from the 
bottom of the list in shop work. Here the correspondence be- 
tween achievement in Latin and performance in shop work is 
one-to-one and definite enough, but the direction of the relation- 
ship is inverse and r = — 1.00. Negative coefficients may range 
from — 1.00 up to .00, just as positive coefficients may range 
from™.00 up to 1.00. Coefficients of — .20, — .50, or — .80 indi- 
cate increasing degrees of negative or inverse relationship, just 
as positive coefficients of .20, .50, and .80 indicate increasing 
degrees of positive relationship, 2 


2. Correlation Expressed as Agreement between Ranks 

Тһе notion underlying correlation can often be most readily 
comprehended from a simple graphic treatment. Three ex- 
amples will be given to illustrate values of r of 100, — 1.00, and 
approximately .00. Correlation is rarely computed when the 


* It may be noted that the number of groups (here 3) is unimportant: 
any convenient set may be used. The important point is that when the 
correlation is zero, one cannot predict a person's score on the second test 


knowing his score on the first test. 


a 


LINEAR CORRELATION 271 
number of cases is less than twenty-five, so that the examples 
here presented must be considered to have illustrative value 
only. 

Suppose that four tests, A, B, C, and D, have been adminis- 
tered to a group of five children. Тһе children have been ar- 
ranged in order of merit on Test А and their scores are then 
Compared separately with Tests B, C, and D to give the follow- 
Ing three cases: 


Case 1 Case 2 Case 3 

ih 

Pupil A B Pupil A с Pupl А D 
à 15 53 а 15 64 a 15 102 
2 l4 52 b 14 65 b 14 100 
E 13 51 е 13 66 c 13 104 
q 12 50 d 12 67 d 12 103 
3 и 49 e ii 68 e и 101 


N ow if the second series of scores under each case (i.c., B, C, and 

) is arranged in order of merit from the highest score down, 
and the two scores earned by each child are connected by à 
Straight line, we have the following graphs: 


Case 1 Case 2 Case 3 

a B A с А D 

15 шаша. 15 68 15 104 

14— — 52 14 67 14 108 

1і-- — 5l 13 66 13 102 

12— — 50 1277 65 12 “ло 

1—4 n 64 n 100 
Ше Onnectinglinesare АП connecting lines Хо system is exhibited 
а ЖО] and parallel, intersect in one point. by the connecting lines, 
4 but the resemblance is 


The correlation is nega- 
tive and perfect, and 
00 


Tum = 


closer to Case 2 than 
to Case 1. Correla- 
tion low and negative 


The more nearly the lines connecting the paired scores are 
rizontal and parallel, the higher the positive correlation. 


boi 
i 


* more nearly the connecting lines tend to intersect in one 
nt, the larger the negative correlation. When the connect- 


xa ‘nes show no systematic trend, the correlation approaches 
О. 


272 STATISTICS IN PSYCHOLOGY AND EDUCATION 


3. Summary on Correlation 


То summarize our discussion up to this point, coefficients of 
correlation range over a scale which extends from — 1.00 through 
.00 to 1.00. А positive correlation indicates that large amounts 
of the one variable tend to accompany large amounts of the 
other; a negative correlation indicates that small amounts of 
the one variable tend to accompany large amounts of the other. 
A zero correlation indicates no consistent relationship. We 
have illustrated above only perfect positive, perfect negative, 
and approximately zero correlation in order to bring out the 
meaning of correlation in a striking way. Only rarely, if ever 
however, will a coefficient fall at either extreme of the sila i hs 
at 1.00 or — 1.00. In most actual problems, calculated Pa fall 
at intermediate points, such as .72, — -26, .50, ete. Such r’s are 
to be intarpreten as “high” or “low” depending in general 
upon how close they are to + 1.00. Interpretation of the degree 
of relationship expressed by 7 in terms of various Criteris il b 
discussed later on pages 333-339. diis 


П. Tue COEFFICIENT оғ CORRELATION * 
1. The Coefficient of Correlation as a Ratio 


J The product-moment coefficient of correlati 

of essentially as that ratio which ы. 5. 
changes in one variable are accompanied by— өр аге den а des 
upon — changes in a second variable. As an г бекіт, құ on 
sider the following simple example which gives the paired ‚ con 
and weights of five college seniors: paired heights 


* This section may be taken up after Section III 


4 


| 


Й 


LINEAR CORRELATION 278 


а) 2 () ч G © (7) (8) (9) 
lit. Wt. 
Student in ч 
inches lbs. 


: Eos г % (5,9 
Ж 1 т y d с: бу (= Z) 
a 72 170 3 0 0 1.34 .00 .00 
b 69 165 0-5 0 00 - .37 00 
с 66 150 —3 -20 60 = L34 — 146 1.96 
d 7 180 1 10 10 ЕТІ 73 32 
e 68 185 —1 15-15 =н 110  — 48 
55 1.80 
Mx = 60. o, = 224“ (2 | 2) 
A. + " 1; 
My = 170lbs. су = 13.69 Ibs.* correlation = 2-9 = im = .36 


From the X and Y columns it is evident that tall students tend 
to be somewhat heavier than short students, and henee the 
correlation between height and weight is almost certainly posi- 
tive. The mean height is 69 inches, the mean weight 170 pounds, 
and the т?з are 2.24 inches and 13.69 pounds, respectively. In 
column (4) are given the deviations (x's) of each man's height 
from the mean height, and in column (5) the deviations (y's) of 
each man’s weight from the mean weight. The product of these 
Paired deviations (ry's) is a measure of the agreement between 
Individual heights and weights, and the larger the sum of the xy 
column the higher the degree of correspondence. When agree- 
ment is perfect (and r = 1.00) the ху column has its maximum 
value. It may be surmised — and with much reason — that 
the sum of the xy’s divided by № (i.e, 52 = 11) should give a 
Suitable measure of the relationship between X and Y. Such 
ап average is not a stable measure of relationship, however, as it 
depends directly upon the units in which height and weight, 
have been expressed, and consequently will vary (as shown in 
the example below) if centimeters and kilograms, say, are em- 
Ployed instead of inches and pounds. One may avoid the 
troublesome matter of differences in units by dividing each x 


Us 
* These o’s were calculated by formula (s = \/ а since the sam- 
Ples аге small (see p. 189). 


974 STATISTICS IN PSYCHOLOGY AND EDUCATION 


and each y by its own c, i.e., by expressing each deviation as а 
standard or z-score. The sum of the products of the standard 
scores — column (9) — divided by № will then yield a ratio 
which, as we shall see later, 75 a stable expression of relationship 
This ratio is the “ product-moment” * coefficient of corrélation. 
Its value of .36 indicates a fairly good positive correlation ber 
tween height and weight in this small sample. The render 
should note that our ratio or coefficient is simply the average 
product of the standard scores of corresponding X and т 
measures. 

Let us now investigate the effect upon our ratio of changi 
the units in terms of which X and Y have been expressed "n 
the example below, the heights and weights of the i fi ж 
students are expressed (to the nearest whole number) in ds 
meters and kilograms instead of in inches and pounds.: conti 


a) (2) Q9 (4) (5) (6) (7) 
8 
ны Wt (8) (9) 
Student in in 
ems. kgs. 
X Y т y zy т у ( E g 
с. S adt 
a 183 77 8 0 0 1 7, (о, z) 
в Ш 92 о “W ._ 2% 00 
с 168 68 -7 -9 63 —125 _ 102 ‘00 
d 178 82 3 5 15 53 43 1.79 
е 173 84 —2 T =14. = аб d 2 
a 11 = 
Mx = 175 cms. о: = 5.61 ems. (= : 2) 81 
My =77kgs. су = 6.30 Кез. correlation = —\9= 0)/ _ 1.81 қ 
MeL мы 
5 


The mean height of our group is now 175 ems. and 
› à t 
weight 77 kgs.; the o’s are 5.61 ems. and 6.30 kgs., on ^ 
у. 


ж The sum of the deviations from the mean (raised 
divided by N is called a “moment.” When pairs of devi power) ahd 
тап 


y are multiplied together, summed, and divided by № (to give 22У 
term “product-moment” is used. N ) the 
These o’s were calculated by formula (: = (2°. ) А 

1 N —1/ Since the 


samples are small. 


А 


LINEAR CORRELATION 275 


Note that the sum of the zy column, namely, 64, differs by 9 


from the sum of the zy's in the example above, in which inches 
and pounds were the units of measurement. However, when 
deviations are expressed as standard scores, the sum of their 
MADE Е 
products (= E z) divided by N equals .36, as before. 
= V. 


Тһе quotient 


is a measure of relationship which remains constant for a given 

Set of data no matter in what units X and Y are expressed. 

>> 

7 . Ж” : 2/77, 

When this ratio is written - 
No.ty 


pression for 7, the product-moment coefficient of correlation.* 


it becomes the well-known ex- 


2. The Scatter Diagram and the Correlation Table 


When М is small, the ratio method described in the preceding 
Section is often employed for computing the coefficient of cor- 
relation between two sets of data. When М is large, however, 
much time and labor may be saved by first arranging the data 
in the form of a diagram or chart, and then calculating devia- 
tions from assumed, instead of from actual, means. Let us con- 
Sider the diagram in Figure 47. This chart, which is called a 
“scatter diagram” or “scattergram,” represents the paired 
heights and weights of 120 college students. The construction 
of а seattergram is a relatively simple matter. Along the left- 

ànd margin from bottom to top are laid off the class-intervals 
of the height distribution, measurement expressed in inches; 
and along the top of the diagram from left to right are laid off 
the class-intervals of the weight distribution, measurement ex- 


Pressed in pounds. Each of the 120 men is represented on the 


* The coefficient of correlation, г, is often called the “Pearson r” after 


Professor Karl Pearson who developed the product-moment method, fol- 
Owing the earlier work of Galton and Bravais. See Walker, H. M., Studies 
™ the History of Statistical Method (1929), Chapter 5, pp. 96-111. 


276 STATISTICS IN PSYCHOLOGY AND EDUCATION 


"Weight in Pounds (X-Variable) 
100- 110- 120- 130- 140- 150- 160- 170- 
109 119 129 139 149 159 169 179 tu Mut 


12-13 1 1745 
S 10-71 16 152.0 
2 
БЕ) 
E 
Š 68-69 4 n 28 1424 
> иш МЕНА T ЗНА n и 10 
@ 66-67 2 9 11 8 а 1 33 1351 
3 и шш їшїн | Б. Е 
E 
‚В 64-65 | 1 5 " 10 3 26 128.0 
E [ BHL. HL М чш HL | 
to 
E 62-63| 1 2 7 1 2 13 125.3 
pn. dg 
60:61. 3 178 
3 10 28 37 22 9 5 6 120 
Мы 625 641 654 66.6 670 689 689 102 
Summary 
š Mean ht. for given Hei Mean wt. for given 
Weight wt. interval eight ht. interval 
170-179 у 70.2 ‘ 72-73 174.5) 4 
160-169 | 4 68.9 | 5 70-71 | д 152.0 a 
150-159 | à 68.9 | = 68-69 | 7 1424 | — 
140-149 | 67.0 | = 66-67 | = 135.1 | 3 
130-139 | е 66.6 | & 64-65 ( ғ, 128.0 | 9 
190-129 | 2 65.4 | 8 62-63 | 8 125.3 | & 
110-119 | 2 өл| 60-61) & 178) 8 
100-109 62.5 8 


Fic. 47. А Seattergram and Correlation Table Showing the Paired 
Heights and Weights of 120 Students. 


diagram with respect to height and weight. Suppose that à 
man weighs 150 pounds and is 69 inches tall. His weight 
locates him in the sixth column from the left, and his height in 
the third row from the top. Accordingly, a “tally” is placed 
in the third cell of the sixth column. There are three tallies in 
all in this cell, that is, there are three men who weigh from 150 
to 159 pounds, and are 08-69 inches tall Each of the 120 men 


LINEAR CORRELATION 277 


is represented by а tally in а cell or square of the table in ас- 
cordance with the two characteristics, height and weight. 
Along the bottom of the diagram in the / row is tabulated the 
number of men who fall in each weight-interval; while along the 
right-hand margin in the f, column is tabulated the number of 
men who fall in each height-interval. The f, column and fz row 
must cach total 120, the number of men in all. After all of the 
tallies have been listed, the frequency in each cell is added and 
entered on the diagram. The seattergram is then а correlation 
table. 

Several interesting facts may be gleaned from the correlation 
table as it stands. For example, all of the men of a given 
weight-interval may be studied with respect to the distribution 
of their heights. In the third column there are twenty-eight 
men all of whom weigh 120-129 pounds. One of the twenty- 
eight is 70-71 inches tall; four are 08-09 inches tall; nine are 
06-67 inches tall; seven are 64-63 inches tall; and seven are 
62-63 inches tall. In the same way, we шау classify all of the 
men of a given height-interval with respect to weight distribu- 
tion. Thus, in the row next to the bottom, there are thirteen 
men all of whom are 62-63 inches tall. Of this group one 
Weighs 100-109 pounds; two weigh 110-119 pounds; seven weigh 
120-129 pounds; one weighs 130-139 pounds; and two weigh 
140-149 pounds. 16 is fairly clear that the "drift" of paired 
heights and weights is from the upper right-hand section of the 
diagram to the lower left-hand section. Even a superficial 
examination of the diagram reveals à fairly marked tendency 
for heavy, medium, and light men to be tall, medium, and 
short, respectively; and this general relationship holds in spite 
of the scatter of heights and weights within any given “array” 
(an array is the distribution of cases within a given column or 
row). Even before making any calculations, then, we should 
Probably be willing to estimate the correlation between height 
and weight to be positive and fairly high. 

Let us now go a step further and calculate the mean height 
of the three men who weigh 100-109 pounds, the men in column 


278 STATISTICS IN PSYCHOLOGY AND EDUCATION 


1. The mean height of this group (using the assumed mean 
method described in Chapter II, p. 41) is 62.5 inches, and this 
figure has been written in at the bottom of the correlation table. 
In the same way, the mean heights of the men who fall in each 
of the succeeding weight-intervals have been written in at the 
bottom of the diagram. ‘These data have been tabulated in a 
somewhat more convenient form below the diagram. From 
this summary, it appears that an actual weight increase of ap- 
proximately eighty pounds (180-100) corresponds to an increase 
in mean height of 7.7 inches; that is, the increase from the 
lightest to the heaviest man is paralleled by an increase of ap- 
proximately eight inches in height. It seems clear, therefore, 
that the correlation between height and weight is positive. 

Let us now shift from height to weight, and applying the 
method used above, find the change in mean weight which corre- 
sponds to the given change in height.* Тһе mean weight of the 
three men in the bottom row of the diagram is 117.8 pounds. 
'The mean weight of the thirteen men in the next row from the 
bottom (who are 62-63 inches tall) is 125.3 pounds. The mean 
weights of the men who fall in the other rows have been written 
in their appropriate places in the Мы, column. In the summary 
of results we find that in this group of 120 men an increase of 
about fourteen inches in height is accompanied by an increase 
of about 56.7 pounds іп mean weight. Thus it appears that the 
taller the man the heavier he tends to be, and again the correla- 
tion between height and weight is seen to be positive. 


3. The Graphic Representation of the Correlation Coefficient 

It is often helpful in understanding how the correlation co- 
efficient measures relationship to see how a correlation of .00 or 
.50, say, looks graphically. Figure 48 (1) pietures a correlation 
of .50. Тһе data in the table are artificial, and were selected 
to bring out the relationship in as unequivocal a fashion as 
possible. The scores laid off along the top of the correlation 


ж This change corresponds to the second regression line in the correlation 
diagram (see p. 280). 


* | 


LINEAR CORRELATION 279 
а а) @) 
X-Test X-Test 
R Re 
0-9 10-19 20-29 30-39 40-49 fy Means 0-9 10-19 20-29 30-39 40-49 fy Means 
40-49 
а |а 445 
30-39 қ 
E 16 34.5 
Ex 20-29 
m 24 24.5 
10-19 
16 14.5 
0-9 
4 45 
- fr 4 16 м 16 4 64 Ј 4 16 34 в 4 64 
= Col. Means 14.5 19.5 24.5 29.5 345 Col.Means 45 14.5 24,5 34.5 44.5 
7=.50 r=1.00 
(3) (4) 
X-Test X-Test 
0-9 10-19 20-29 30-89 40-49 f, HOW. 0-9 10-19 20-29 30-3940-49 f, Кон 
40-49 4 245 40-9 4 85 
30-3 
QUEM 6 2056 8089 16 17.0 
3 ы 
& 20-29 8 20-29 
Б 24 245 Н 24 24.5 
= 10-19 0-19 
wos [| 16 320 
ui 4 245 id 4 395 
ë Ла 16 24 16 4 64 fe 4 в 2 16 4 64 
ol Means 24.5 245 24.5 245 24.5 Col. Means 39.5 320 24.5 11.0 95 
T-.00 т= — 75 


Fic. 48. Тһе Graphical Representation of the Correlntion Coefficient. 


table from left to right will be referred to simply as the X-test 
"Scores," and the scores laid off at the left of the table from 
bottom to top as the Y-test "scores." As was done in Figure 47, 
the mean of each Y-row is entered on the chart, and. ће means 
of the X-columns are entered at the bottom of the diagram. 
The means of each Y-array, that is, the means of the “scores” 
falling in each X-column, are indicated on the chart by small 
Crosses. Through these crosses a line, called a regression line,* 


has been drawn. This line represents the change in the mean 
* 


;.. Regression lines have important properties; they will be defined and 
discussed more fully in Chapt X. 


280 STATISTICS IN PSYCHOLOGY AND EDUCATION 


value of Y over the given range of X. In similar fashion, the 
means of each X-array, i.e., the means of the scores in each 
Y-row, are designated on the chart by small circles, through 
which another line has been drawn. This second regression line 
shows the change in the mean value of X over the given range of 
Y. These two lines together represent the “linear” or straight- 
line relationship between the variables X and Y. 

The closeness of association or degree of correspondence be- 
tween the X- and Y-tests is indicated by the relative positions 
of these two regression lines. When the correlation is positive 
and perfect, the two regression lines close up like a pair of scissors 
to form one line. Chart (2) in Figure 48 shows how the two 
regression lines look when r = 1.00, and the correlation is per- 
fect. Note that the entries in Chart (2) are concentrated along 
the diagonal from the upper right- to the lower left-hand section 
of the diagram. There is no "scatter" of scores in the succéssive 
columns or rows, all of the scores in a given array being concen- 
trated within one cell. If Chart (2) represented a correlation 
table of height and weight, we should know that the tallest man. 
was the heaviest, the next tallest man the next heaviest, and 
that throughout the group the correspondence of height and 
weight was perfect. 

A very different picture from that of perfect correlation is 
presented in Chart (3) where the correlation is .00. Here the 
two regression lines, through the means of the columns and rows, 
have spread out until they are perpendicular to each other. 
There is no change in the mean Y-score over the whole range of 
X, and no change in the mean X-score over the whole range of Y. 
This is analogous to the situation described on page 269, in which 
the mean tapping rate of a group of students was the same for 
those with “high,” “middle,” and “low” Army Alpha scores. 
When the correlation is zero, there is no way of telling from a 
subject’s performance in one test what his performance will be 
in the other test. The best one can do is to select the mean as 
the most probable value of the unknown score. 

Chart (4) in Figure 48 represents a correlation coefficient of 


Жа. 


ip 


LINEAR CORRELATION 281 


— 45. Negative relationship is shown by the fact that the re- 
gression lines, through the means of the columns and rows, run 
from the upper left- to the lower right-hand section of the 
diagram. "The regression lines are closer together than in Chart 
(1) where the correlation is . 0, but are still separated. If this 
chart represented a correlation table of height and weight, we 
Should know that the tendency was strong for tall men to be 
light, and for short men to be heavy. 
The charts in Figure 48 represent, as was stated above, а 
linear relationship between sets of artificial test scores. The 
data were selected so as to be symmetrical around the means of 
each column and row, and hence the regression lines go through 
all of the crosses and through all of the circles in the successive 
Columns and rows. It is rarely if ever true, however, that the 
regression lines pass through all of the means of the columns and 
rows.in a correlation table which represents actual test scores or 
other real measures. Figure 49, which reproduces the correla- 
tion table of heights and weights given on page 276, illustrates 
this fact. The mean heights of the men in the weight (X) 
Columns are indicated by crosses, and the mean weights of the 
Men in the height (Y) rows by circles, as in Figure 48. Note that 
the series of short lines joining the successive crosses or circles 
present a decidedly jagged appearance. Two straight lines have 
been drawn in to describe the general trend of these irregular 
lines, These two lines go through, or as close as possible to, the 
Crosses or the circles, more consideration being given to those 
Points near the middle of the chart (because they are based 
upon more data) than to those at the extremes (which are based 
“pon few scores). Regression lines are called lines of “best fit” 
because they satisfy certain mathematical criteria to be given 
later (p.311). Such lines describe better than any other straight 
Ines the “run” or “drift” of the crosses and circles across the 
chart, 
In Chapter X we shall develop equations for the “best 
tting” lines and show how they may be drawn in to describe 
the trend of irregular points on a correlation table. For the 


282 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Weight in Pounds (X) 


100- 110- 120- 130- 140-  150-  160- 170- 
109- 119 129 139 149 159 169 179 Row 


Means 


174.5 


6 152.0 


142.4 


135.1 


128.0 


Height in Іпеһев (Y) 


125.3 


117.8 


fe з 10 28 т 232 9 5 6 190 
Col. Means 626 64.1 654 666 610 689 689 102 


Fic. 49. Graphical Representation of the Correlation between 
Height and Weight in a Group of 120 College Students, 
(Fig. 47.) 

present, the important fact to get clearly in mind is that when 

correlation is “linear,” the means of the columns and rows in à 

correlation table can be adequately described by two straight 

lines and the closer together these two lines the higher the 
correlation. 


ПІ. THe CALCULATION ОЕ THE COEFFICIENT OF CORRELATION 
BY THE Propuct-Moment Метнор 


1. The Calculation of r from a Correlation Table 


Having discussed the meaning of correlation in the last sec- 
tions, we shall now proceed to the calculation of the coefficient 
of correlation by the product-moment method. Figure 50 will 
serve as an illustration of the computations required. This cor- 
relation table gives the paired heights and weights of 120 col- 
lege students, and is derived from the scattergram for the same 


ал 


283 


ATION 


LINEAR CORREL 


"$juopng 9890;) 
OZI JO ЗА раз syy 943 пәәлууәч чоңәлтоу jo 3uorogjoor) juouro]y-jonpoq OY} Jo uonv[nope)) ‘09 сод 


70’ = 09-4 PO'ST = OLX FECT = 293 = ах тет = 
(ТАД ЖЕ SCIX IST E OZI ^. пр 
097 Dsuy — 94 т“ ZX рю 902 
= SI Ta вт Are 
7080 — ы ат-@1— 9- < 
"= 081 — = д 96 9 99 % 8 07 uf 
8T 
2 >-28-(6//764 91 8T zz (9-)92- oz- 6— af 
(орт) t 8 2 т 0 t= $— @- 
өт @ ЕІ- 697 90 2 0119 9 6 Z в 8 от е "f 
чї 4- “72 (9-%- 5 e 
m 
% ZI- т 82 = %- = g 2. 
БА 
ы 
м М- 8 œ g %- % 5 
Е 
0 2 0 88 8 
zZz z т g 82 (95 I 8% bi 
5 
Hi 
99 gz Z 89 9 88 2 91 І E 
т? т 6 е е T£ 7 80-01 " 
en | = | | | = 20 
бах zz = + QA hf ^f ӨЛ 691 6ST GPE 681 621 61 60r 
x ~ ë 4 -OLI -09I -OST -OPI -08T -OZI -OIT -00T 
ES (аченед-х) spunoq ur 4439M 


284 STATISTICS IN Psy CHOLOGY AND EDUCATION 


data shown in Figure 47. The following outline of thc steps 
in the process of calculating r will be best understood if the 
student will constantly refer to Figure 50 as he reads through 


each step. 


Step 1 

Construct a scattergram for the two variables to be corre- 
lated, and from it draw up a correlation table as described on 
page 275. 


Step 2 

The distribution of heights for the 120 men is in the f, column 
at the right of the diagram. Assume a mean for the height dis- 
tribution, using the rules given in Chapter П, page 41, and 
draw double lines to mark off the row in which the assumed 
mean (A/) falls. The mean for the height distribution has been 
taken at 66.5 in. (midpoint of interval 66-67) and the у’; have 
been taken from this point, The prime (7) of the t^s and ys 
indicates that these deviations are taken from the assumed 
means of the X and Y distributions (see page 42). Now fill in 
the fy" and Лу” columns. From the first column c,, the correc- 
tion in units of interval, is obtained запа this correction together 
with the sum of the fy’? will give the c of the height distribution, 
су. Аз shown by the calculations in Figure 50, the value of б» 
is 2.62 inches. 

Тһе distribution of the weights of the 120 men is in the J: row 
at the bottom of the diagram. Assume а mean for the weight 
distribution, and draw double lines to designate the column 
under the assumed mean (wi). The mean for the weight distri- 
bution is taken at 134.5 pounds (midpoint of interval 130- 
139), and the z^s are taken from this point. Fill in the fx’ and 
the fx” rows; from the first calculate с», the correction in units 
of interval, and from the second calculate Cz, the с of the entire 
weight distribution. In Figure 50, the value of c. is found to be 


15.54 pounds. 


> 


LINEAR CORRELATION 285 


Step 8 

The calculations in Step 2 simply repeat the now familiar 
process of calculating & by the Assumed Mean method. Our 
first new task is to fill in the Хә?” column at the right of the 
chart. Since the entries in this column may be either + or —, 
two columns are provided under Zr'y'. Calculation of the 
entries in the Za’y’ column may be illustrated by considering, 
first, the single entry in the only occupied cell in the topmost 
row. The deviation of this cell from the AM of the weight dis- 
tribution, that is, its 2’, is four intervals, and its deviation from 
the AM of the height distribution, that is, its y’, is three inter- 
vals, Hence, the product of the deviations of this cell from the 
two AM’s is 4 X 3 or 12; and a small figure (12) is placed in the 
Upper right-hand corner of the cell.* The * product-deviation " 
of the one entry in this cell is 1 (4 X 3) or 12 also, and hence a 
figure 12 is placed in the lower left-hand corner of the сей. This 
figure shows the produet of the deviations of this single entry 
from the АЛГ of the two distributions. Since there are no 
Other entries in the cells of this row, 12 is placed at once under 
the + sign in the Хау" column. 

Consider now the next row from the top, taking the cells in 
Order from right toleft. The cell immediately below the one for 
Which we have just found the product-deviation also deviates 
four intervals from the АМ (wt) (its x’ is 4), but its deviation 
from the АМ (ht) is only two intervals (its у’ is 2). The product- 
deviation of this cell, therefore, is 4 X 2 or 8, as shown by the 
Small figure (8) in the upper right-hand corner of the cell. 
There are three entries in this cell, and since each has а produet- 
deviation of 8, the final entry in the lower left-hand corner of the 
Cell is 3(4 X 2) or 24. The product-deviation of the second cell 
In this row is 6 (its 2’ is 3 and its y’ is 2) and since there are two 
entries in the cell, the final entry is 2(3 X 2) or 12. Each of the 


* We may consider the coürdinates of this cell to be 2’ = 4, у’ =3. 


The т” is obtained by counting over four intervals from the vertical column 

Containing the AM (ші), and the y' by counting up three intervals from 

ie horizontal row containing the АЛГ (ht). The unit of measurement is 
© class-interval, 


286 STATISTICS IN PSYCHOLOGY AND EDUCATION 


four entries in the third cell over has a product-deviation of 4 
(since x’ = 2 and y’ = 2) and the final entry is 16. In the fourth 
cell, each of the three entries has a product-deviation of 2(д/ = 1 
and у’ = 2) and the cell entry is б Тһе entry in the fifth cell 
over, the cell in the AM (wt) column, is 0, since x’ is 0, and ac- 
cordingly 3(2 х 0) must be 0. Note carefully the entry (- 2) 
in the last cell of the row. Since the deviations of this cell are 
a’ = — 1, and у = 2, the product 1(— 1 x 2) = — 2, and the 
final entry is negative. Now we may total up the plus and 
minus entries in this row and enter the results, 58 and — 2, in 
the Ух’у’ column under the appropriate signs. 

The final entries in the cells for the other rows of the table 
and the sums of the product-deviations of each row are obtained 
as illustrated for the two rows above. "Тһе reader should bear in 
mind in calculating z'/y^"s that the product-deviations of all 
entries in the cells in the first and third quadrants of the table 
are positive, while the product-deviations of all entries in the 
second and fourth quadrants are negative (p. 11). It should be 
remembered, too, that all entries either in the column headed 
by the AM x or the row headed by the AM y have zero product- 
deviations, since in the one case the 2’ and in the other the y’ 
equals zero. 

Since all entries in a given row have the same у’, the arith- 
metic of calculating x’y’’s may often be considerably reduced if 
each entry in a row-cell is first multiplied by its x’, and the sum 
of these deviations (2a’) multiplied once for all by the common 
y', viz. the у’ of the row. The last two columns Ха’ and 
Zix/y' contain the entries for the rows. To illustrate the method 
of calculation, in the second row from the bottom, taking the 
cells in order from right to left, and multiplying the entry in 
each сей by its 2’, we have (2х1) + (1х0) + (7х — 1) 
+ (2% — 2) + 0x —3) or 12. If we multiply this “devia- 
tion-sum ” by the у’ of the whole row (i.e., by — 2) the result is 
24 which is the final entry in the Хау’ column. Note that this 
entry checks the 28 and — 4 entered separately in the Хх” 
column by the longer method. This shorter method is often 


LINEAR CORRELATION 287 


employed in printed correlation charts and is recommended for 
Use as soon as the student understands fully how the cell entries 
are obtained. 


Step 4 (Checks) 

The Zz'y' may be checked by computing the product-devia- 
tions and summing for columns instead of rows. The two rows 
at the bottom of the diagram, Ху’ and Улу’, show how this is 
done. We may illustrate with the first column on the left, tak- 
ing the cells from top to bottom. Multiplying the entry in each 
cell by its appropriate у’, we have (1х — 1) + (1 x - 2) 
+ (1X - 3) or —6. When this entry in the Zy' row is multi- 
plied by the common 27 of the column (i.e., by — 3) the final 
entry in the Za'y' row is 18. The sum of the а/у” computed 
from the rows should check the sum of the x’y’ computed from 
the columns, 

Two other useful checks are shown in Figure 50. The ty’ will 
equal the Dy’ and the fx’ will equal the Dz" if no error has been 
made. The Лу and the fr’ are the same as the Ху” and Xa; 
although these columns and rows are designated differently, 
they denote in each case the sum of deviations around their AM. 


Slep 5 

When all of the entries in the Ez/y' column have been made, 
and the column totaled, the coefficient of correlation may be 
calculated by the formula 


„сс = (ы) 
0.0, 

(coefficient of correlation when deviations are taken from 
the assumed means of the two distributions) * 


Substituting 146 for ту; .02 for cy; .18 for cz; 1.31 for oy; 
1.55 for о,; and 120 for N, r is found to be .60. (Зее Fig. 50.) 


* This formula for r differs slightly from the ratio formula developed 
fu Раде 275. The fact that deviations are taken from assumed rather than 

om actual means makes it necessary to correct Ez y! by subtracting the 
Produet of the two corrections с; and су. 


288 STATISTICS IN PSYCHOLOGY AND EDUCATION 


It is very important to remember that су, су, Cz, and c; are all 
left in units of class-interval in formula (44). This is done be- 
cause all product-deviations (z'y"s) are in interval-units, and it 
is desirable therefore to keep all of the terms in the formula іп 
interval-units. Leaving the corrections and the two о’ in units 
of class-interval facilitates computation, and does not change 
the result (i.e., the value of the coefficient of correlation). 

Several printed charts are available for use in calculating 
coefficients of correlation by the product-moment method. Тһе 
following may be mentioned: 

1. The C-D Machine Correlation Chart, by E. E. Cureton and J. W. 
Dunlap, published by the Macmillan Co., New York, N.Y. 

2. Dvorak Correlation. Chart, by August Dvorak, published by Long- 
mans, Green and Co., New York, N.Y. 

3. Otis Correlation Chart, by Arthur Otis, published by the World 
Book Co., Yonkers, N.Y. 

4. Correlation. Chart, by E. F. Lindquist, published by Houghton 
Mifflin Co., Boston, Mass. 

5. The Durost-Walker Correlation Chart, by W. N. Durost and Н. М. 
Walker, published by the World Book Co., Yonkers, N.Y. 


2. The Calculation of r from Ungrouped Data 
(1) The Formula for r When Deviations Are Taken from the 
Means of the Two Distributions X and Y 
In formula (44) т” and у’ deviations are taken from assumed 


ЖЕРГІ 


545 ух 
means; and hence it is necessary to correct x by the product 


of the two corrections, сг and c, (р. 44). When deviations have 
been taken from the actual means of the two distributions, in- 
stead of from assumed means, no correction is needed, as both 
c,and су are zero. Under these conditions, formula (44) becomes 


Улу 
T № 0.0 (45) 
(coefficient of correlation when deviations are taken from 
the means of the two distributions) 


AN 


LINEAR CORRELATION 289 
which is the ratio for measuring correlation developed on page 
275. If we write yee for с. and V ігі for бу, the N’s cancel 


and formula (45) becomes 
Уту 
r=— (46) 
VEX Уу 
(coefficient of correlation when deviations are taken from 
the means of the two distributions) 


in which х and y are deviations from the actual means as in (45) 
and Za? and Sy? are the sums of the squared deviations in x and 
У taken from the two means. 

When N is fairly large, so that the data can be grouped into a 
correlation table, formula (44) is always used in preference to 
formulas (45) or (46) as it entails much less calculation. For- 
mulas (45) and (46) may be used to good advantage, however, in 
finding the correlation. between short, ungrouped series (say, 
twenty-five cases or so). It is not necessary to tabulate the 
Scores into a frequency distribution. An illustration of the use 
of formula (46) is given in Table 44. Тһе problem is to find 
the correlation between the scores made by twelve adults on 
two tests of “controlled association.” 

The steps in computing г may be outlined as follows: 

Step 1 

. Find the mean of Test 1 (X) and the mean of Test 2 (Y). 
The means in Table 44 are 62.5 and 30.4, respectively. 

Step 2 

Find the deviation of each score on Test 1 from its mean, 
62.5, and enter it in column z. Next find the deviation of each 
Score in Test 2 from its mean, 30.4, and enter it in column y. 


Step 3 


_ Square all of the 275 and all of the y’s and enter these squares 
m columns а? and 42, respectively. Total these columns to 


obtain De? and Ey 


290 STATISTICS IN PSYCHOLOGY AND EDUCATION 


TABLE 44 


To ILLUSTRATE THE CALCULATION OF г FROM UNGROUPED Scores WHEN 
Deviations ARE TAKEN FROM THE MEANS OF THE SERIES 


Test 1  Test2 


Subject X 9 т y a? y? zy 
A 50 22 — 12.5 — 84 .25 70.56 105.00 
p 54 25 — 8.5 —54 .2: 29.16 45.90 
с 56 34 — 6.5 3.6 .25 12.96  — 23.40 
D 59 28 — 3.5 —24 „25 5.76 8.40 
E 60 26 — 2.5 —44 6.2 19.36 11.00 
Е 62 30 —5 -.4 .25 .16 .20 
G 61 32 — 1.5 1.6 2.25 2.56 — 2.40 
H 65 30 2.5 — 4 6.25 16 — 1.00 
I 67 28 45 —24 20.25 5.76 - 10.80 
J 71 34 8.5 3.6 72.25 12.96 30.60 
к 71 36 8.5 5.6 72.25 31.36 47.60 
L 74 40 11.5 9.6 132.25 92.16 110.40 
750 365 595.00 282.92 321.50 
(Уаз) (Ху) (Хау) 
Мх - 62.5 Му = 30.4 

х 5 

Ery 321.50 (46) 


— E 
Via х Ур V595 x 282.92 


Step 4 

Multiply the 2’s and y's in the same rows, and enter these 
products (with due regard for sign) in the zy column. Total the 
ту column, taking account of sign, to get Dry. 


Step 5 

Substitute for “ту, 321.50; for Хл7, 595; and for Ey?, 282.02 
in formula (46), as shown in Table 44, and solve for r. 

While formula (46) is useful in caleulating r directly from two 
ungrouped series of scores, it has the same disadvantage as the 
“Jong method" of calculating means and o’s described in 
Chapters II and III. Тһе deviations 2 and y when taken from 
the actual means are usually decimals and the multiplication 
and squaring of these values is often а tedious task. For this 
reason — even when working with short ungrouped series — it 
is often easier to assume means, calculate deviations from these 
АМ”, and apply formula (44). The procedure is illustrated in 


AN 


s 
? 


LINEAR CORRELATION 291 


TABLE 45 


To ILLUSTRATE THE CALCULATION OF ғ FROM UNGROUPED Scores 
WHEN DEVIATIONS ARE TAKEN FROM THE ASSUMED 
MEANS OF THE SERIES 


Test 1 Test 2 


Subject X у a! y т? y? ту 
А 50 22 -10 -8 100 64 80 
В 54 25 —6 -5 86 25 30 
С 56 34 -4 4 16 16 — 16 
D 59 28 -1 -2 1 4 2 
E 60 26 0 —4 0 16 0 
F 62 30 2 0 4 0 0 
G 61 32 1 2 1 4 2 
H 65 30 5 0 25 0 0 
I 67 28 7 -2 49 4 —14 
J 71 34 11 4 121 16 44 
K 71 36 11 6 121 36 66 
L 74 40 14 10 196 100 140 

750 305 070 285 331 
(Sx) (Xy?) (Угу) 

АМх = 60.0 
Mx = 62.5 as 
с: = 2.5 334 _ 
© = 6.25 . 22 — 3800 ii 

"= 704 X 486 

7: = У — 6.25 o, = VF — 16 

= 7.04 = 4.86 г = .78 


Table 45 with the same data given in Table 44. Note that the 
two means, Mx and M y, are first caleulated. Тһе corrections, 
€: апа с,, are found by subtracting AMx from Mx and АМ. Y 
from М у (p.44). Since deviations are taken from assumed 
Means, fractions are avoided; and the calculations of Хх”, X; 2 

ay ате readily made. Substitution in formula (44) then 
Blves y, 


(2) Тһе Calculation of r from Raw Scores, i.e., When Devia- 
tions Are Taken from Zero 
The calculation of r may often be carried out most readily — 
SSpecially when a calculating machine is available — by means 
Of the following formula which is based upon “raw” or obtained 
Scores: 


292 STATISTICS IN PSYCHOLOGY AND EDUCATION 
Ж УХУ — ММұМ, 
VEX - NM*xJ[ZY? - NAP] 


(coefficient of correlation calculated from raw or obtained scores) 


(47) 


r 


In this formula, X and Y are obtained scores, and Mx and My 
are the means of the X and Y series, respectively. 2X? and EY? 
are the sums of the squared X and Y values, and N is the number 
of cases. 

Formula (47) is derived directly from formula (44) by as- 
suming the means of the X and Y tests to be zero. If AMx 
and AM y are zero, each X and Y score is a deviation from its 
AM as it stands, and hence we work with the scores themselves. 
Since the correction, c, always equals M — AM, it follows that 
when the AM equals 0, c; = Mx, c, = My and ссу = МхМу. 
Furthermore, when с. = Mx and c, = My and the “scores” are 
* deviations," the formula 


в. = V EE cu X interval 
N 


(see p. 62) becomes 


72 
с. = zx — My 


and c, for the same reason equals V I = М, If we sub- 


stitute these values for сс, Oz and с, т formula (44), the 
formula for r in terms of raw scores given in (47) is obtained. 

An alternate form of (47) is often more useful in practice. 

This is NZXY —-ZX xX ZY (48) 
r= | 

VENZX:- EXy]LNZY:- СУ] 

(coefficient of correlation calculated from raw or obtained. scores) 


ж 
р] 


X 
N for Mx, 


This formula is obtained from (47) by substituting 


and zY for My in numerator and denominator, and canceling 


the №. . | 
The caleulation of т from original scores is shown in Table 46. 


z 
4.5 


-an 


if 


LINEAR CORRELATION 293 


The data are again the two sets of twelve scores obtained on the 
“controlled association” tests, the correlation for which was 
found to be .78 in Table 44. This short example is for the pur- 


TABLE 46 


To ILLUSTRATE THE CALCULATION OF г FROM UNGROUPED DATA 
Wuen DEVIATIONS ARE ORIGINAL Scores (AM's = 0) 


Test 1 Test 2 
Subject X Y x: y: xy 
A 50 22 2500 484 1100 
В 54 25 2916 625 1350 
С 56 34 3136 1156 1904 
b 59 28 3181 784 1652 
E 60 26 3600 676 1560 
E 62 30 3844 900 1860 
G 6l 32 3721 1024 1952 
H 65 30 4225 900 1950 
I 67 28 4489 784 1876 
J 71 34 5041 1156 2414 
K 71 36 5041 1296 2556 
L 74 40 5476 1600 2960 
750 365 37470 11385 23134 
Мх = н 

My = 540 (means to two decimals) 

- 23134 — 12 x 62.50 x 30.42 а.” 
V/[47470 — 12 X (62.50)*] [11385 - 12 X (30.12)*] 
т = .78 


Pose of illustrating the arithmetic and must not be taken аз a 
Tecommendation that formula (47) be used only with short 
Series. As a matter of fact, formula (47) or (48) is most useful, 
Perhaps, with long series, especially if one is working with a 
calculating machine. 

The computation by formula (48) is straightforward and the 
method easy to follow, but the calculations become tedious if 
the scores are expressed in more than two digits. For this 
reason, when using formula (48) it will often greatly lessen the 
arithmetical work, if we first “reduce” the original scores by 
Subtracting a constant quantity from each of the original X and 

Scores. In Table 47, the same two series of twelve scores have 

беп reduced by subtracting 65 from each of the X scores, and 
from each of the Y scores. The reduced scores, entered in 


204 STATISTICS IN PSYCHOLOGY AND EDUCATION 


TABLE 47 


To ILLUSTRATE THE CALCULATION ОЕ г FROM UNGROUPED БАТА 
Waen Deviations Are ORIGINAL Scores (AM's = 0) 


Scores are “reduced” by the subtraction of 65 from each X, and 25 
from each Y to give X’ and У’. 


Test Test 


1 2 
Sub- y 7 zr yı т” r , 
pe x YF X Y X уз жу 
А 50 22 —15 -3 225 9 45 
B 54 25 11 0 121 0 0 
C 5 34 —9 9 81 81 — 81 
D 5 28 —6 3 36 9 — 18 
E 60 26 -5 1 25 1 —5 
F 62 30 —3 5 9 25 — 15 
G 61 32 -4 Т 16 49 — 28 
H 65 30 0 5 0 25 0 
I 67 28 2 3 4 9 6 
J 71 34 6 9 36 81 54 
K 71 36 6 11 36 121 66 
L 74 40 9 15 81 225 135 
750 365 — 30(ZX") 65(ZY'") 670(2X^) 635(EY')  159(XX'Y?) 
zx' i M ZY’ 25 
Мх = N +65 Му = у + 25 
30 -" 65 ж 
== + 65 = 12 + 25 
= 62.5 = 30. 


pa (12 X 159) + (30 x 65) 
vĪI2 x 670 — (= 30)*][12 x 635 — (65)?] 
. 3858 
4923 
= .78 
the table under X’ and У’, are first squared to give ХХ” and 
ХУ”, and then multiplied by rows to give ZX'Y'. Substitution 
of these values in formula (48) gives the coefficient of correla- 
tion r. If the means of the two series are wanted, these may 


З i zx' DY’ 
readily be found by adding to N and Pd the amounts by 


(48) 


which the .X and Y scores were reduced (see computations in 


ТаЫе 47). . 
Тһе method of computing т by first reducing the scores is 


Я > 


) 


LINEAR CORRELATION 295 


usually superior to the method of applying formula (47) or (48) 
directly to the raw scores. This is because we deal with smaller 
Whole numbers, and much of the arithmetie ean be done men- 
tally. When raw scores have more than two digits, they are 
cumbersome to square and multiply unless reduced. The 
Student, should note that instead of 65 and 25 other constants 
might have been used to reduce the X and Y scores. If the 
smallest X and Y scores had been subtracted, namely, 50 and 
22, all of the X" and Y" would, of course, have been positive. 
This is an advantage in caleulation but these reduced scores 
Would have been somewhat larger numerically than are the re- 
duced scores in Table 47. In general, the best plan in reducing 
Scores is to subtract constants which are close to the means. 
Тһе reduced scores are then both plus and minus, but are 
numerically about as small as we can make them. 
(3) Тһе Caleulation of r by the Difference-Formula 

It is apparent from the preceding sections that the product- 
moment formula for r may be written in several ways, depend- 
ing upon whother deviations are taken from actual or assumed 
means, and upon whether raw scores or deviations are employed. 

he present section contributes still another formula for caleu- 
lating 7 — namely, the difference-formula. This formula will 
Complete our list of expressions for 7, as it is believed that the 
Student who understands the meaning and use of the correlation 
formulas given in this chapter will have no difficulty with other 
variations which he may encounter.* 

The formula for r by the difference method is 

24° + Ly? — Ха 


(coefficient of correlation by difference-formula, deviations 
from the means of the distributions) 


in which ХФ = T(x — у). 


* Ж В : 2 

See the followi article which lists fifty-two variations of the r- 

formula: с Жаы b» ме E Variations of the Product-Moment (Pearson) 
oefficient of Correlation," Journal of Educ. Psych. 17 (1926), 458-469. 


(49) 


296 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Тһе principal advantage of the difference-formula is that no 
cross products (ху’5) need be computed. For this reason, this 
formula is employed in several of the printed correlation charts. 
Formula (49) is illustrated in Table 48 with the same data used 
in Table 44 and elsewhere in this chapter. Note that the z, y, 
a?, and у? columns repeat Table 44. The d or (x — y) column is 
found by subtracting algebraically each y-deviation from its 
corresponding «-deviation. These differences are then squared 
and entered in the d? or (= — у)? column. Substitution of Ха?, 
Dy?, and Ха? in formula (49) gives r = 78. 


TABLE 48 


To ILLUSTRATE THE CALCULATION OF г FROM UNGROUPED Dara ву 
THE DrrrERENCE-FORwULA, DEVIATIONS FROM THE MEANS 


Test 1 Test2 d d 

Subject X Y z y («-?) 22. y (= — y)? 
A 92 — 125 — 84  —4. 156.25 70.56 16.81 

B 25 — 8.5 —54 —3.1 7225 29.16 9.61 

© 34 —65 36 25 1296 102.01 

D 28 —85 —24 5 5.76 1.21 

E 26 — 2.5 —44 5 19.30 3.61 

Е 30 -5 —4 5 16 01 

G 32 — 1.5 1.6 5 2.56 9.61 

H 30 25 —4 5 16 8.11 

I 28 45 —24 | 5.76 47.61 

J 34 8.5 3.6 12.96 24.01 

K 36 8.5 5.6 31.36 8.41 

É 40 11.5 9.6 92.16 3.61 
595.00 282.92 234.92 

Mx = 62.5 
595.00 + 282.92 — 234.92 
t= Toe (49) 
2\595 X 282.92 
My = 304 


= 18 

Another form of the difference-formula is often useful, es- 
pecially in machine calculation. This version makes use of raw 
or obtained scores: 
МХ + SY? — E(X — Ү)] — 2(2X) x (ZY) (50) 

2 ENZX! - ZXy']LNZY? - (ZYY] 
(coefficient of correlation by difference-formula, calculation 
from raw or obtained scores) 


r 


““ 


Ns 


LINEAR CORRELATION 297 


in which Z(X — Y)? is the sum of the squared differences be- 
tween the two sets of scores. 


IV. RELIABILITY OF THE COEFFICIENT OF CORRELATION 
1. The Standard and Probable Errors of a Coefficient of 
Correlation 
Тһе usual formulas for the standard and probable errors of à 
coefficient of correlation are 


gum yu (51) 


and 

67451 =F) 
VN-1 

(standard and probable errors of a cocfficient of correlation) 


PE, = (52) 


The PE, formula is the more often used, perhaps because the 
РЕ has become established in the literature as the result of long 
usage. When r = .60 and М = 120 (see height and weight prob- 
lem in Fig. 47), PE = .04 to two decimals [from (52)]. This 
probable error is taken to mean that the chances are 50 in 100 
(odds 1:1) that the obtained r of .60 does not miss the true or 
Population value by more than = .04. 

There are two serious objections to the use of formulas (51) 
and (52). In the first place, the т in these formulas is really the 
true or population r. Since we do not have the true r, we must 
Substitute the calculated or sample r in the formula in order to 
Set an estimate of the standard or probable error. If the ob- 
tained гізіп error, our estimate will also be in error; and at best 
it is approximate. 

In the second place, the sampling distribution of r is not nor- 
mal except when the population r is .00 and N islarge. When 
718 high (.80 or more) and N is small, the sampling distribution 
of ris skewed and the РЕ is decidedly misleading. The reason 
for skewness in the sampling distribution of high 775 grows out 
of the fact that the range of 775 is definitely limited at + 1.00 


298 STATISTICS IN PSYCHOLOGY AND EDUCATION 


and — 1.00. Suppose, for example, that г = .80 апа М = 20. 
Then in a new sample of twenty cases the probability of an r less 
than .80 is much greater than the probability of an r greater 
than .80 because of the obtained r's nearness to unity. The 
distribution of r's obtained from successive samples of twenty 
cases will be skewed negatively (p. 119) and the skewness to the 
left will increase as г increases. For small and intermediate 
values of т, say between == .50, and for N's of 100 or more, the 
distribution of r in successive samples will conform fairly closely 
to the normal curve and formulas (51) апа (52) will yield useful 
estimates of reliability. But unless used with caution, РА, is 
likely to be misleading. 

It has been customary for a long time to regard an r as worthy 
of confidence if it is at least four times its РЕ. If r = .20 and 
N = 40, РЕ, = .10, and our т is twice its PE. On the assump- 
tion that the true 7 in the population is zero, the obtained r of 
.20 (since it is only 2РЕ from zero) could well be attributed to 
sampling errors, and hence is not significant. When М is 150, 
however, the correlation coefficient of .20 is four times its proba- 
ble error of .05 and can hardly be attributed solely to accidents 
of sampling. 


2. Testing the Reliability of a Coefficient of Correlation 
Against the Null Hypothesis 

The significance of an obtained r may be tested more exactly 
against the null hypothesis than in terms of РЕ,, Assuming the 
population 7 to be zero, the method consists in comparing the ¢ 
value (see Table 29) for the obtained т with the 26 to be expected 
by chance at the .05 and .01 limits. The ¢ for a given 718 found 

from the formula 
i= rVN-2 (53) 

м-в 
(t for determining the significance of a computed т 
on the null hypothesis) 


jn which r = the obtained coefficient and N = the number of 
cases. The value of ¢ may be read from Table 29, page 190, 


LINEAR CORRELATION 299 


which is entered with N — 2 degrees of freedom. То illustrate, 
Suppose 7 —.60 and М = 120 (р. 283). Then from (53) 


60V 
== B or 8.15. Entering Table 29 with 118 degrees of 


freedom (№ — 2 = 118), we find that г at the .05 level is 1.98, 
апа at the .01 level, 2.62. Since our £ is far larger than the 
Second of these values, we conclude forthwith that the null 
hypothesis is clearly disproved and our r is very significant. 
Тһе probability that we should have obtained an r of .60, if 
the true г were .00, is much less than .01. 

А simpler method of testing the significance of an r than by 
computing £ is to enter Table 49 with N — 2 degrees of freedom 


TABLE 49 


CORRELATION COEFFICIENTS AT THE 5% AND 1% LEvELs оғ 
SIGNIFICANCE 


> Example: When М is 52 and (N — 2) is 50, an г must be .273 to be 
Significant at .05 level, and .354 to be significant at .01 level. 


Degrees of Degrees of 
freedom .05 .01 freedom 05 01 
М-2) (N —2) 

1 .997 1.000 24 .388 496 

2 .950 .990 25 381 ART 

3 878 959 26 374 478 

E 811 917 27 367 470 

5 754 874 28 361 463 

6 707 834 29 355 456 

А 666 798 30 319 449 

8 632 765 35 325 418 

9 602 735 40 304 303 

10 576 708 45 288 372 
11 553 684 50 273 354 
12 532 661 60 250 325 
13 514 641 70 232 302 
14 497 623 80 217 283 
15 482 606 90 205 207 
16 468 590 100 195 254 
16 456 575 125 174 228 
18 444 561 150 159 208 
19 433 549 200 138 181 
20 423 537 300 113 148 
21 413 526 400 098 128 
22 404 515 500 088 115 


300 STATISTICS IN PSYCHOLOGY AND EDUCATION 


and compare our sample г with the tabulated entries. Two sig- 


nificance levels, .05 and .01, appear in Table 49. Тһе table is 
read as follows: Suppose г = .60 and М = 120. 'Then for 118 
degrees of freedom the entries at .05 and .01 are by linear inter- 
polation, .180 and .235, respectively. This means that only five 
times in 100 trials would an ras large as + .180 appear by acci- 
dents of sampling if the population r were actually .00; and 
only once in 100 trials would an г of + .235 appear if the popula- 


"i r= -480 г-.00 r= 180 

t=-1.98 t= 1.5 

t= -2.62 m ыр 
Fic. 51. 


When the true 7 is zero and N = 120 (118 df) 5% of 
sample r's exceed = 180, and 1% exceed + .235. 


tion r were .00. It is clear that the obtained r of .60, since it is 
much larger than .235, is very significant. Another way of 
stating the same conclusion is to say that we may be confident 
at the .01 level that the true 7 is nol zero. Figure 51 represents 
he situation outlined in the example above. The entries in 
Table 49 were found by substituting for N and for t in (53), the 
ls being taken from the .05 and .01 columns in Table 29. 

It will be noted from Figure 51 that Table 49 takes account 
of both ends of the sampling distribution — does not consider 
the sign of r. When М = 120, the probability (P/2) of an r of 
-+ .180 or more, on the null hypothesis, is .025; and the proba- 


if 


LINEAR CORRELATION 301 


bility (P/2) of an г of — .180 or less is also .025. For a P/2 of 
:01 (or P = .02), the r by interpolation between .05 (.180) and 
01 (235) is .221. On the null hypothesis, therefore, only once 
in 100 trials would a positive r of .221 or a larger value arise 
through sampling accidents. 

The .05 and .01 levels in Table 49 are the only ones we will 
need ordinarily in evaluating the significance of a calculated 7. 
Several illustrations of the use of Table 49 are given below: 


i Degrees of NP . 
Size of Sample ео Caleulated Interpretation 
(N) (N — 2) r 
10 8 70 significant at. .05, 
not at .01 level 
152 150 - 12 not и: 
27 25 .50 significant at .05. 
е hardly at .01 level 
500 498 .20 very significant. 
100 98 — 30 very significant 


It is clear from these examples that even a small r may be 
Significant if computed from a large sample, while an 7 as high 
as .70 may not be very significant if М is small. Table 49 is 
especially useful when № is small, as it is here that the PE of 
ап ris most apt to be misleading. Suppose, for example, that 
We have calculated an г of .55 for a sample of twelve cases. 
The РЕ of this r, by formula (52) is .14, and since the r of .55 is 
About four times its PE, we might conclude that our correla- 
Чоп is very significant. From Table 49, however, we note that 
for 10 degrees of freedom (№ — 2 = 10), an r must be .708 to be 
Significant at the .01 level. Furthermore, an r must be .642 

efore the probability is .01 that this r or a larger value will 

Occur on the null hypothesis (at Р = .02, r = .642 by interpola- 

Чоп between 105 and .01 in Table 49). For this small sample, 

а conclusion as to significance based upon the РЕ, would 
Clearly be in error. 

he interpretation of the significance of a low r should always 

€ tentative, Even when small 7’s are significant by our tests, 


302 STATISTICS IN PSYCHOLOGY AND EDUCATION 


it is a good plan to repeat the experiment on another sample 
before announcing a final decision. 


3. Testing the Reliability of a Correlation Coefficient by 
Fisher’s z-Function 

R. A. Fisher * has shown that r can be transformed into a 
new statistic called z which is normally or nearly normally dis- 
tributed (p. 297) no matter what the size of r. A further ad- 
vantage of z is that its standard error depends entirely upon the 
size of the sample and is independent of the calculated value of z. 
Тһе 2 corresponding to any given т may be read from a table 
provided by Fisher. 

The significance of any given r may be determined by trans- 
forming it into a z, caleulating the SE of 2, and applying tests of 
significance. If z divided by SE. is greater than 2.58 (Table 29) 
the null hypothesis may be safely discarded. The transformed 
т orz may also be used in testing the significance of the difference 
between two r's. When our 7s are obtained from independent 
random samples, formula (29) may be used, the 8/7 of the z's 
being substituted in the formula for т?з. When two or more 
т'ѕ are obtained from the same sample, the z transformation is no 
longer strictly applicable, but an approximate method may still 
be employed.1 


4. Averaging Coefficients of Correlation 


It is a fairly common practice to average correlation coeffi- 
cients computed from tests given to comparable groups in order 
to obtain a generalized picture of the relationship between the 
two variables. The averaging of 775 is a dubious and often an 
incorrect procedure. (1) т’з do not vary along a linear scale 50 
that the increase from .40 to .50 does not mean the same increase 
in relationship as does an increase from .80 to .90. (2) When 

* Fisher, В. A., Statistical Methods for Research Workers (Sth ed., 1941), 
pp. 190-203. 


t+ Lindquist, E. F., Statistical Analysis in Educationol Research (1040), 
pp. 217-218. 


LINEAR CORRELATION 303 


c r's and — zs are averaged, they tend to cancel each other out. 
Thus the mean of an r of .60 and an r of — .60 is :00, and two 
Substantial measures of correlation combine to give a result 
which indicates no real relationship. When 7's do not differ 
greatly in size, an arithmetie mean will yield a result which is 
often useful; but this is not true when 775 differ widely in size 
or in sign, Averaging ап ғ of .70 and an г of .60 to obtain .65 is 
Permissible; but averaging an r of .90 and an r of .10 to obtain 
-50 is not, 
The safest plan is not to average »'s at all. When for various 
reasons averaging seems to be demanded by the problem, the 
est method is to transform the "в into z's and take the arith- 
metic mean of the z's; An average r may then be obtained 
from the average z. 


PROBLEMS 


1. Find the correlation between the two sets of scores given below, 
using the ratio method (p. 272). 


Subjects X Y 
а 15 40 

b 18 42 

с 22 50 

а 17 45 

в 19 43 

{ 20 46 

16 41 

4 21 41 


2. The Scores given below were achieved upon Army Alpha and Type- 
Writing Tests by 100 students in a typewriting class. _ The type- 
Writing scores are in number of words written per minute, with 
Certain penalties. Find the coefficient of correlation and its PE,. 
Check the significance of r by Table 49. Use an interval of five 
Units for Y and an interval of ten units for X. 


304 STATISTICS IN PSYCHOLOGY AND EDUCATION 
Typing (Y) Alpha (X) Typing (Y) Alpha (X) Typing (Y) Alpha (X) | 


46 152 26 164v 40 120 
31 96 33 127 36 140 
46 171 44 144 43 141 
40 172 35 160-4 48 143 
49 138 49 106 45 138 
41 154 40 95 58” 149 
30 127 57. 146 23 142 
46 156 23 175 45 166v 
34 156 51 126 44 138 
48 133 35 120 47 150 
48- 173 41 154 29 148 
38 134 28 146 46 166v 
26 179 32 154 46 146 
37 159 50 159 39 107 
34 1674 29 175 49 139 
51 136 41 164и 34 183 
47 153 32 111 41 150 
39 145 49 164v 49 179 
32 134 58- 119 31 138 
37 184 35 160v 47 136 
26 154 48 149 40 172 
40 90 40 149 30 145 
53, 143 43 + 143 40 109 
46 173 388: 1159 38 158 
39 168% 37 5 ' 157 29 115 
52 187 41 22153 43 93 
47 166v 51 149 55 163-/ 
81 172 40 '163 7 37 147 
33 1804 35 175 52 169/ 
22° 147 31 133 38 75% 
46 150 23 178 39 152 
44 150 37 168 32 159 
37 143 46 156 42 150 


LINEAR CORRELATION 305 


3. In the correlation table given below compute the coefficient of cor- 
relation. Test the significance of r (p. 297). 


Boys: Aces 4.5 то 5.5 Years 
Weight in Pounds (X) 
29-33 | 34-38 | 39-43 | 44-48 | 49-53 | Totals 


45-47 P 3 
42-44 65 


39-41 
36-38 


Я 
ct 
Е 
Е 
= 
EJ 
3 
5 


Totals 


4. In the following correlation table compute the coefficient of correla- 
tion and test its significance. 
Army Alpha I.Q.'s 


School (84 and. 
Marks lower 


90 and over 
85-89 
80-84 


1 
1 


306 STATISTICS IN PSYCHOLOGY AND EDUCATION 


5. Compute the coefficient of correlation between the Algebra Test | 
scores and I.Q.'s shown in the table below and test its significance. * 


ALGEBRA Test Scores 


Totals 


3 


5 


6. Compute the correlation between the two sets of scores given below 
(a) when deviations are taken from the means of the two series 
[use formula (46)]; 
(b) when the means are taken at zero. First reduce the scores by 
subtracting 150 from each of the scores in Test 1, and 40 from 
each of the scores in Test 2. 
(c) Test the significance of r. 


Test 1 Test 2 Test 1 Test 2 
150 60 139 41 
126 40 155 43 
135 45 147 37 
176 50 162 58 
138 56 156 48 
142 43 146 39 
151 57 133 31 
163 38 168 46 
137 41 153 52 


178 55, 150 57 


LINEAR CORRELATION 307 


7. Find the correlation between the two sets of memory-span scores 
given below (the first series is arranged in order of size) (a) when 
deviations are taken from assumed means [formula (44) ], (b) by the 
difference-method given on page 295. Test significance of г. 


Test 1 Test 2 
(digit span) (letter span) 
15 12 
14 14 
13 10 
12 8 
11 12 
п 9 
11 12 
10 8 
10 10 
10 9 
9 8 
9 T 
8 7 
М 8 
7 6 


8. Fill in the following table: 


Size of Degrees of " Sienitioanes 
Sample Freedom 
(N) (у — 2) 
(à) 15 13 - 68 
(b) 30 28 92 
() 82 80 — 30 
(d) 2% 223 05 
ANSWERS 
= .60 


І 


- 05; PE, = 07; not significant 

= 71; highly significant, beyond .01 level 
— 406; highly significant, beyond .01 level 
.52; highly significant, beyond .01 level 


ескен 
ч зо чоч оч 


1 


308 


б. r= 


STATISTICS IN PSYCHOLOGY AND EDUCATION 


.41; т not significant at .05 level 
178; significant beyond .01 level 

very significant (beyond .01 level) 
not significant 

very significant 

not significant 


de 


CHAPTER X 
REGRESSION AND PREDICTION 


I. Tur REGRESSION EQUATIONS 


1. The Problem of Predicting One Variable from Another 
Surrosk that in a group of 120 college students (р. 283), we 
Wish to estimate a certain man's height knowing his weight to be 
153 pounds. The best possible “guess” that we can make of 
this man's height is the mean height of all of the men who fall 
in the 150-159 weight-interval. In Figure 52 the mean height 
of the nine men in this column is 68.9 inches, which is, therefore, 
the most likely height of a man who weighs 153 pounds. In the 
same way, the most probable height of a man who weighs 136 
Pounds is 66.6 inches, the mean height of the thirty-seven men 
who fall in weight-column 130-139 pounds. And, in general, 
the most probable height of any man in the group is the mean 
of the heights of all of the men who weigh the same (or approxi- 
mately the same) as he, i.e., who fall within the same weight- 
column. ! 

Turning to weight, we can make the same kind of estimates. 
Thus, the best possible “guess” that we can make of a man’s 
Weight knowing his height to be 66.5 inches is 135.1 pounds, 
Viz., the mean weight of the thirty-three men who fall in the 
height-interval 66-67 inches. Again, in general, the most proba- 
ble weight of any man in the group is the mean weight of all of 
the men who are of the same (or approximately the same) 
height, 

_ Our illustration shows that from the scatter diagram alone it 
18 possible to “predict” one variable from another. But the 
Prediction is rough, and is obviously subject to a large “error of 
estimate.” * Moreover, while we have made use of the fact 
* See page 320. 
309 


310 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Weight in Pounds (X) 
100- 110- 120-  130- 140- 150- 160- 170- 
109 119 129 BI ы 149 159 169 179 


Г " 
12-13 1 
70-71 37] 16 
68-69 28 


Height in Inches (Y) 
p 


; 


fz 3 10 28 31 22 9 5 6 120 


Fic. 52. Illustrating Positions of Regression Lines and Calculation 


of the Regression Equations, (See Fig. 50, p. 283.) 


Му = 66.5 inches in class-interval units, viz. — 
y = .5lz) see 
т = .7lyf р. 316. 


Caleulation of Regression Equations 
I. Deviation Form 


m xa 2.62 — 
(1) Я = 00 X 554 2 = 102 
= 15.54 P 
(2) == .60 х жут 3.56y 


II. Score Form 


Q) Y 66.5 = .10(X — 136.3) or Y = 10Х + 52.9 
(2) X — 136.3 = 3.56(Y — 66.5) or X = 3.56Y — 1004 


Calculation of Standard Errors of Estimate 
Fest. уу = 2.62V1 — .60? = 2.10 inches 
без. x) = 19.54 V1 — .60? = 12.43 pounds 


For plotting on the chart, regression 
equations are written with с; and oy 


E 


na ^ 


Ж 


REGRESSION AND PREDICTION 3il 


that the means are the most probable points in our arrays 
(columns or rows), we have made no use of our knowledge con- 
cerning the correlation between the two variables. The two 
regression lines * in Figure 52 are definitely determined by the 
correlation. between height and weight and their degree of 
Separation indicates the size of the correlation coefficient 
(р. 279). Consequently, they describe more regularly, and in a 
more generalized fashion than do the series of short straight 
lines joining the means, the relationship between height and 
Weight over the whole range (see also р. 282). А knowledge of 
the equations of these lines is necessary if we are to make as 
accurate а prediction as our data will permit. For example, 
given the weight (X) of a man comparable to those in our group, 
on substituting in the equation connecting Y and X we are able 
to predict this man’s height more accurately than if we took the 
mean of his height array. Тһе task of the next section will be 
to develop equations for the two regression lines by means of 
which precise predictions from X to Y or from Y to X can be 
accomplished. 


2. The Two Regression Equations in Deviation Form 

The equations of the two regression lines in a correlation table 
represent the straight lines which “best fit” the means of the 
Successive columns and rows in the table. Using as a definition 
of “best fit” the criterion of “least squares," f Pearson worked 
Out the equation of the line which goes through, or as close as 
Possible to, more of the column-means than any other straight 


* The term “regression” was first used by Francis Galton with refer- 
ence to the Milentanee of stature. Galton found that children of tall 
Parents tend to be less tall, and children of short parents less short, than 

eir parents. In other words, the heights of the offspring tend to “move 

ack” toward the mean height of the general population. This tendency 
toward maintaining the “mean height” Galton called the principle of re- 
Eression, and the line describing the relationship of height in parent and 
Offspring was called a “regression line." The term is still employed, al- 
though its original meaning of "stepping back” to some stationary average 

not necessarily implied (see p. 331). Я 

ог ап ding 4254 же treatment of the method of least 
quares as applied to the problem of fitting regression lines, see Walker, 

* M., Elementary Statistical Method (1943), рр. 308-310. 


312 STATISTICS IN PSYCHOLOGY AND EDUCATION 


line; and also the equation of the line which goes through, or as 
close as possible to, more of the row-means than any other 
straight line. These two lines are ‘‘best fitting" in a mathe- 
matieal sense, the one to the observations of the columns and 
the other to the observations of the rows. 

The equation of the first regression line, the line drawn to 
represent the crosses in Figure 52, is as follows: 


уха (54) 


(regression equation of y on x, deviations taken from 
the means of Y and X) 
б, 


Тһе factor т = is called the regression coefficient, and is often 
z 


replaced in (54) by the term b,- or bi» so that formula (54) may 
be written y = 0,. Ха, or J = bis Хз. Тһе bar over the (y) 
means that our estimate is an average value. 

If we substitute in formula (54) the values of r, бу, апа су, 
obtained from Figure 52, we һауе 


= 2.62 са 
y= 00 X тв ог у = .105 


This equation gives the relationship of deviations from mean 
height to deviations from mean weight. Whenz — 1.00, y — .10; 
and a deviation of one pound from the mean of the X's (weight) 
is accompanied by a deviation of .10 inch from the mean of the 
Y's (height). Тһе man who stands one pound above the mean 
weight of the group, therefore, is most probably .10 inch above 
the mean height. Since this man's weight is 137.3 pounds 
(136.3 + 1.00), his height is most probably 66.6 inches (00.5 
+ .10). Again, the man who weighs 120 pounds, 1.е., is 16.3 
pounds below the mean of the group, is most probably 64.9 
inches tall — or about 1.6 inches below the mean height of the 
group. То get this last value, substitute z = — 16.3 in the 
equation above to get j — — 1.63, and refer this value to its 
mean. The regression equation is, in effect, a generalized state- 
ment. It tells us that the most probable deviation of an indi- 


2 


fe 


REGRESSION AND PREDICTION 313 


vidual in our group from the M (М) is just .10 of his deviation 
from the M (wt). 

The equation y = т 2 Х х gives the relationship between y 
and х in deviation form. "This designation is necessary because 
the two variables are expressed as deviations from their respec- 
tive means (Le, ав х and y); hence, for a given deviation 
from Mx the equation gives the most probable accompanying 
deviation from My. 

Тһе equation of the second regression line, the line drawn 
through the means of the rows in Figure 52, is 


гу. Ху (55) 


(regression equation of x on y, deviations taken from 
the means of Х ала У) 


Аз in the first regression equation, the regression coefficient 
95. à 
T g.is often replaced by the expression bzy or бз and formula (55) 
и 


Written z = ba X y or = ba X y. 
If we substitute for т, Oz, and ту, in formula (55), we have 


15.54 
2.62 
from which it is evident that а deviation of 1 inch from the 
M (ht), or from 66.5 inches, is accompanied by a deviation of 
3.56 pounds from the M (wt), or from 136.3 pounds. Expressed 
БспегаПу, the most probable deviation of апу man from the 
Mean weight is just 3.56 times his deviation from the mean 
height, Accordingly, а man 67 inches tall or .5 inch above the 
mean height (66.5+.5 = 67) most probably weighs 138.1 
Pounds, or is 1.8 pounds above the mean weight (136.3 + 1.8). 
(Substitute y = .5 in the equation and z = 1.8) 


z= .60 х уох = 3.56у 


Equation z = 2 x y gives the relationship between x and у 
т, 


in deviation form. That is to say, it gives the most probable 


314 STATISTICS IN PSYCHOLOGY AND EDUCATION 


deviation of an X-measure from Mx corresponding to a known 
devialion in the Y-measure from М y | | | 
Although both of the regression equations given above in- 
volve т and y, the two equations cannot be used interchange- 
ably — neither can be used to predict both x and y. This is an 
important fact which the reader must understand clearly and 
constantly bear in mind. The first regression equation 
y=r = x x can be used only when y is to be predicted from a 


z 


given х (when y is the “dependent” variable) *; while the 


е с . 
second equation 2 = r rs X y can be used only when 2 is to be 
и 


predicted from а known y (when z is the “dependent” variable). 
There are always two regression equations in a correlation table, 
the one through the means of the columns and the other through 
the means of the rows, unless the correlation is 1.00 or — 1.00. 


When r = 1.00,у=т 91 у x becomes y- © хх or Jo. = 20у. 
Oz Oz 
S. c. - 
Also, when г = 1.00, z—r E Xy becomes g= = ху or 
и y 
20, = у0.. Іп short, when the correlation is perfect, (+ 1.00), 
the two equations are identical and the two regression lines 
coincide. To illustrate this situation, suppose that the correla- 
tion between height and weight in Figure 52 were perfect. 


The first regression equation would then be 7 = 1.00 x 202 


15.54 
реч Р 15.54 _ 
ог y = .17x, and the second, 2 = 1.00 x ggz) T 5.93y. 


Algebraically, the equation x = 5.93y is equal to у = .17x; for 


if we put x = I x = 5.93y. When г = = 1.00 there is only one 


equation and a single regression line. Moreover, if r = + 1.00, 
and in addition с. = бу, the single regression line makes an 
angle of 45° or 135? with the horizontal axis, since y = + 2. 


* Тһе dependent variable takes its value from the other (independent) 
variable in the equation. For example, in the equation y = 32° + 5х — 


10, 
y “depends” for its value upon z; hence y is the dependent variable. 


REGRESSION AND PREDICTION 315 


3. Plotting the Regression Lines in a Correlation Table * 


In Figure 52, the coórdinate axes have been drawn in on the 
correlation table through the means of the X- and Y-distribu- 


Тіс. 53. Plot of the Straight Line, у = 27. 


* А brief review of the equation of a straight line, and of the method 
9f plotting а simple linear equation is given here in order to simplify the 
plotting of the regression equations. » 

In Figure 53, let X and У be coórdinate axes, or axes of reference. Now 
Suppose that we are given the equation у = 2r and are required to repre- 
Sent the relation between т and y graphically. To do this we assign values 
to x in the equation and compute the corresponding values of y. When 
2 = 2, for example, y = 2 X 20г4; when z = 3, y = 2 X 3or6. In the 
Same way, given any z-value we can compute the value of y which will 

Satisfy” the equation, that is, make the left side equal to the right. If 
16 series of x nnd у values found from the equation are plotted on the 
Magram with respect to the Х- and Y-coórdinates (as in Fig. 53) they 
will be found to fall along a straight line. This straight line pictures the 
Tp[ation y = 2х. It goes through the origin, since when т = 0, у = 0. 
16 equation у = 22 represents, then, a straight line which passes through 
16 origin; and the relation of its coórdinates (points lying along the line) 


is such that , called the slope of the line, is always equal to 2. 
т 


„The general equation of any straight line which passes through the 
“гп may be written y = mz, where т is the slope of the line. If we re- 


Place m in the general formula by r бу it is clear that the regression line 
Oz 


in deviation form, namely, y = r Фу т, is simply the equation of a straight 
; E c. 
line Which goes through the origin. For the same reason, when the general 


і "NET H 2 А 
equation of a straight line through the origin is written z = my, x = r s yis 
v 


also seen to be a straight line through the origin, its slope being r z. 
y 


316 STATISTICS IN PSYCHOLOGY AND EDUCATION 


tions. The vertical axis is drawn through 136.3 pounds (Мы), 
and the horizontal axis through 66.5 inches (м). These axes 
intersect close to the center of the chart. Equations (54) and 
(55) define straight lines which pass through the origin or point 
of intersection of these coórdinate axes. Гог this reason, it is a 
comparatively simple task to plot in our regression lines on the 
correlation chart with reference to the given coórdinate axes 

Correlation charts are usually laid out with equal distances 
representing the X and Y class-intervals (the printed correlation 
charts are always so constructed) although the intervals ex- 
pressed in terms of the variables themselves may be, and often 
аге, unequal and incommensurable. This is true in Figure 52. 
In this diagram, the intervals in X and Y appear to be equal, al- 
though the actual interval for height is 2 inches, and the actual 
interval for weight is 10 pounds. Because of this difference in 
interval-length in the two variables it is very important that we 
express с. and с, in our regression equations in class-interval 
units before plotting the regression lines on the chart. Other- 
wise we must equate our X and Y intervals by laying out our 
diagram in such а way as to make the X-interval five times the 
Y-interval. This latter method of equating intervals is im- 
practical, and is rarely used, since all we need do in order to use 
correlation charts drawn up with equal intervals is to express 
с. and су in formulas (54) and (55) in units of interval. When 
this is done, and the interval, not the score, is the unit, the first 
regression equation becomes 


1.31 
y- 605554 ог у = .5lz 


and the second 


1.55 - 
t= 607377 or z—.71y 


Since each regression line goes through the origin, only one 
other point (besides the origin) is needed in order to determine 
its course. In the first regression equation, if х = 10, y = 5.1; 
and the two points (0, 0) and (10, 5.1) locate the line. In the 
second regression equation, if y = 10, х = 7.1; and the two 


REGRESSION AND PREDICTION 317 


points (0, 0) and (7.1, 10) determine the second line. In plotting 
points on a diagram any convenient scale may be employed. 
А millimeter rule is useful. 

It is important for the reader to remember that when the 
two o’s are expressed іп interval units, regression equations do 
nol give the relationship between the X and Y score deviations. 
"These special forms of the regression equations should not be 
used except when plotting the equations on a correlation chart. 
Whenever the most probable deviation in the one variable corre- 
sponding to a known deviation in the other is wanted, formulas 


` (54) and (55), in which the o’s are expressed in score units, must 


be employed. 


4. The Regression Equations in Score Form 

In the last sections it was pointed out that formulas (54) and 
(55) give the equations of the regression lines in deviation form 
— that values of x and у substituted in these equations are de- 
viations from the means of the X and Y distributions, and are 
not scores. While the equations in deviation form are actually 
all that one needs in order to pass from one variable to another, 
it is decidedly convenient to be able to estimate an individual’s 
actual score in Y, say, directly from the score in X without first 
converting the X-score into a deviation from Mx. This can be 
done by using the score form of the regression equations. The 
conversion of deviation form to score form is made as follows: 
Denoting the mean of the Y’s by My and any Y-score simply 
by Y, we may write the deviation of any individual from the 
Mean as Y — My or, in general, y = Y — My. Inthesame way, 
x = X — My when а is the deviation of any X-score from the 
mean X. If we substitute Y — My for y, and X — Mx for г, 
1n formulas (54) and (55), the two regression equations become 

с 


Y- My =r% (X- Му) o Y =" (Х-Мх) + My (56) 
and Oz z 
Х- Mx =r (У- My) or = "2 (Y¥—My)+Mx (57) 
y y 


(regression equations of Y on Х and X. on Y in score form) 


318 STATISTICS IN PSYCHOLOGY AND EDUCATION 


"These two equations are now said to be in score form, since the 
X and Y in both equations represent actual scores, and not 
deviations from the means of the two distributions. 

If we substitute in (56) the values of My, т, Gy, Cz and Мх 
obtained from Figure 52, the regression of height on weight in 
score form becomes . 

Y -.60x 252. (X — 136.3) + 66.5 
or upon reduction 
Y — .10X 4- 52.9 


To illustrate the use of this equation, suppose that a man in our 
group weighs 160 pounds and we wish to estimate his most prob- 
able height. Substituting 160 for X in the equation, Y — 69 
inches; and accordingly, the most probable height of a man who 
weighs 160 pounds is 69 inches. 

If the problem is to predict weight instead of height, we 
must use the second regression equation, formula (57). Substi- 
tuting for Mx, т, сє, ту, and М у in (57) we have 


15.54 


X = 60x 562 (Y - 66.5) + 136.3 
or = 
X = 3.56Y — 100.4 


Now if a man is 71 inches tall, we find, on replacing Y by 71 in 
the equation, that X = 152.4. Hence the most probable weight 
of а man who is 71 inches tall is about 1521 pounds. 


5. The Meaning of a “Prediction” from the Regression 
Equation 

It may seem strange, perhaps, to talk of “predicting” a man's 
height from his weight, when the heights and weights of the 
120 men in our group are already known. When we have 
measures of both height and weight it is unnecessary to estimate 
one from the other. But suppose that all we know about a 
given individual is his weight and the fact that he falls within 
the age-range of our group of 120 men. Since we know the 


REGRESSION AND PREDICTION 319 


correlation between height and weight to be .60, it is possible 
from the regression equation to prediet the most probable 
height of our subject in lieu of actually measuring him. Further- 
more, the regression equation may be employed to estimate the 
height of any man in the population from which our group is 
chosen, provided our sample is an unbiased selection from the 
larger group. A regression equation holds, of course, only for 
the population from which the sample group was drawn We 
cannot estimate the heights of children or of women from a 
Tegression equation which describes the relationship between 
height and weight in men between the ages of eighteen and 
twenty-five years (the age-range of the students in our group). 
Conversely, we cannot expect a regression equation established 
for elementary school children to hold for older groups. 

Height and weight, since they are both easily measured, per- 
haps do not demonstrate the value of the regression equation so 
clearly as do other and more complex traits. These variables 
Were chosen for our “model” problem because they are objec- 
tive and observable and their meaning is definite. Let us now 
Consider a problem of more direct psychological interest. Sup- 
Pose that in a group of 300 high-school children of nearly the 
Same age, the correlation between group test scores obtained at 
the beginning of the school year and average grades made in the 
first year of high school is .60. Now if we administer the group 
test to a child who enters school the next year, it is possible from 
his Score to estimate his probable scholastic performance by 
means of the regression equation between test score and grades 
Obtained from the previous year's class. Forecasts of this sort 
are useful in educational prognosis and guidance.* The same 
15 true of vocational guidance; we are often able to predict from 
а test battery the probable success of an individual who con- 
templates entering a certain trade or profession.t Advice on 
Such a basis is measurably better than subjective judgment. 


p * Edgerton, Н. А., Academic Prognosis in the University, Educational 
Sychology Monographs, 27 (1930). В ч : 
a xi goad W. H., and Shartle, C. L., Occupational Counseling Techniques 


320 STATISTICS IN PSYCHOLOGY AND EDUCATION 


IL Tur RELIABILITY OF PREDICTIONS 


1. The Standard Error of Estimate 

The values of X and Y “predicted” from regression equa- 
tions have been constantly referred to as being the “most 
probable” values of the one variable accompanying the given 
value of the other. In order to show just how probable such 
estimates are it is necessary that we calculate their standard 
errors of estimate. The accuracy with which we are able to 
predict Y-scores from equation (56) is given by the formula 


бам y) = OyV 1 — 7? (58) 
[standard error of а Y-score predicted from equation (56)] * 


in which c; is the ø of the Y distribution, and r is the coefficient 
of correlation. The subscript “est.” is used to distinguish this 
standard error from the с of the distribution, the см, ete. 

From formula (56) we have calculated the most probable 
height of a man weighing 160 pounds to be 69 inches. The relia- 
bility of this prediction is obtained by substituting сор and 7 
in formula (58) to find 


Cost. y) = 2.62V 1 — .60? = 2.1 inches 


We now say that the most probable height of a man weighing 
160 pounds is 69 inches with a ost.) of 2.1 inches; and that the 
chances are about two in three that our prediction does not miss 
the man's actual height by more than + 2.1 inches. We may 
feel quite certain that the estimated height cf this man does not 
miss his true height by more than + Зо) ог by more than 
+ 6.3 inches. 

The degree of accuracy with which X-scores can be predicted 
from (57) is given by the formula 


бом Xy = 05V 1 — r? (59) 
[standard error of ап X-score predicted from equation (57)1 
ж The probable error of estimate is PE gest, Y) = .6745 gy V | — r? 


Л 
B 


REGRESSION AND PREDICTION 321 


in which c. is the с of the X distribution, and r is the coefficient 
of correlation. 

We found on page 318 that the most probable weight of a man 
in our group who is 71 inches tall is 152.4 pounds. Тһе б (езі) 
of this prediction from (59) is 


Test. x) = 15.54V 1 — .60° = 12.4 pounds 


апа the most probable weight of any man 71 inches tall, in our 
group or in the population from which it is drawn, is 152.4 
pounds with а Fest.) Of 12.4 pounds. The chances, therefore, 
are about two in three that our prediction does not miss our 
man’s true weight by more than + 12.4 pounds. 


2. The Accuracy of Individual Predictions from Regression 
Equations 

The formulas for Test) Measure the error made in taking 
Predicted, instead of actual, X and Y measures. If r= 1.00, 

l— 7? is 0, and Sest) is zero — there is no error of estimate 
and each person's measurement is predicted exactly. On the 
other hand, when т = .00, v/1— т? = 1.00, and the error of 
estimate is equal to the с of the distribution into which predic- 
Чоп is made. When this last situation occurs, the regression 
equation is of no value in enabling us the better to predict 
Scores, as each person's most probable score (e.g., X) is simply 
the mean (i.e., Mx). When r — .00 all that we can say definitely 
15 that a subject’s score lies somewhere in the distribution of 

’s or X's; But just where we cannot tell, since our SE of 
estimate equals the SD of the test. 

It is clear from formulas (58) and (59) that the accuracy of 
Prediction from a regression equation depends directly upon the 
0% of the two distributions ( су or oz) and upon the degree of 
Correlation between the two sets of measures. If the variability 

95) of Y is small, and the correlation between Y and X high 
€g., .90), values of Y can be predicted from known values of X 
With а comparatively high degree of accuracy. However, when 
the variability of a test is large, or the correlation low (or when 


322 STATISTICS IN PSYCHOLOGY AND EDUCATION 


both conditions obtain), prediction from regression equations | 


becomes so unreliable as‘to be almost valueless. Even when the 
correlation is fairly high, forecasts will often have an uncom- 
fortably large error of estimate. Thus we have seen that in 
spite of the r — .60 between height and weight (Fig. 52), our 
forecast of a man's weight, knowing his height, has a ges. ху of 
about 12 pounds (p. 321). Prediction of height from weight is 
somewhat better than prediction of weight from height. Pre- 
dicted heights will, in two-thirds of the cases, be in error by not 
more than 2 inches. An example in which high correlation off- 
sets fairly large variability, permitting reasonably accurate 
forecasts, is given later in Figure 54. 

When an investigator uses the regression equations for pur- 
poses of prediction, he should always give the Tiest.) Of his esti- 
mated scores. Тһе value of a forecast depends, first of all, 
upon the size of the error of estimate; but it also depends upon 
the units of measurement, and upon the purposes for which the 
predietion is made (p. 333). 


3. The Accuracy of Group Predictions 


We have seen in (2) above that the Standard error of а pre- 
dicted score (био) may often be uncomfortably large. Only 
when r = 1.00 is VI = 72 = -00, and only then can an estimate 
be made without error. The correlation coefficient must be .87 
before М1 — 7? is .50, i.e., before the standard error of estimate 
is reduced 50% below the o of the test. Obviously, unless 7 is 
quite large (larger than we usually get in practice) the regression 
equation is of little aid in forecasting with reasonable accuracy 
what a given individual may be expected to do (р. 334). This 
has led many to discount unwisely the value of correlation in 
prediction and to conclude that the calculation of ris not worth 
the trouble. 

Fortunately correlation makes out better in forecasting the 
performance of groups than in predicting the most likely achieve- 
ment of a given individual. In forecasting achievement the 
psychologist is in much the same position as the insurance stat- 


hf 


REGRESSION AND PREDICTION 323 


istician or actuary. The actuary cannot tell how long John 
Smith, aged twenty, will live. But from his tables, he can tell 
quite accurately how many of 10,000 men now aged twenty will 
live to be thirty, forty, or fifty years old. In the same way, the 
psychologist may be quite uncertain concerning the performance 
of a given individual. But knowing the correlation between a 
test (or test battery) and some criterion of performance, he can 
forecast often with considerable accuracy the probable per- 
formance of various groups chosen from his distribution of test 
Scores. The degree of accuracy in such predictions depends 
upon the size of the correlation coefficient. 

To illustrate “actuarial” prediction in psychology, suppose 
that 70% of a freshman class of 400 men achieve grades in their 
college work above the minimum passing mark and hence are 
regarded as "satisfactory" students. Suppose, further, that 
the correlation between a standard intelligence test and fresh- 
man performance is .50. Now if we had selected the upper half 
of our group (і.е., the 200 students who performed best on the 
intelligence test) at the beginning of the term, how many of 
these 200 would have been “satisfactory,” i.e., in the upper 
70% of the “grades” distribution? From Table 50 it can easily 
be read that 84% of our 200 selected freshmen (i.e., 168) should 
be found in the satisfactory group with respect to grades. The 
entry .84 is found in column .50 (percentage of test distribution 
chosen) opposite the correlation of .50. This result should be 
Compared with the 70% (i.e., 140) who might be expected to fall 
Tn the satisfactory group when selection is by “guess,” without 
knowledge of the correlation. This entry is in column .50 
Opposite the г of .00. 

Тһе probable performance of other and smaller groups chosen 

Tom our test distribution сап be estimated with much greater 
accuracy from Table 50. We know, for example, that 91% of 
the best 20%, of our students (roughly, seventy-three in the 
first eighty) can be expected to prove satisfactory in terms of our 
Criterion (і.е., being located іп the upper 70% of the grade dis- 
tribution). Read the entry .91 in column 20 opposite 7 = .50. 


324 STATISTICS IN PSYCHOLOGY AND EDUCATION 
TABLE 50% 


Proportion OF STUDENTS CONSIDERED SATISFACTORY 
IN TERMS OF GRADES = .70 


Selection Ratio: Proportion Selected on Basis of Tests 


Е (05 .10 20 .30 40 20 .60 70 80 90 .95 

O 70 70 70 70 70 70 70 70 70 70 
no 18 73 42 12 72 ЛІ лї 0711 71 17010 
10 77 716 75 Л4 ЛЗ ЛЗ 72 79 71 71 70 
15 80 79 77 76 Л5 Л4 73 73 72 71071 
20 83 8І .79 18 77 Л6 75 74 73 71071 
95 86 44 .81 30 78 77 76 75 73 72 т 
30 88 .6 .81 82 80 78 977 75 71 792071 
85 91 89 .86 .83 82 .80 78 70 75 73071 
40 93 .01 .88 85 83 .81 79 77 75 73 .72 
45 94 .93 90 .87 85 83 81 78 76 73 72 
50 6 .4 91 50 87 81 .82 во 77 74 12 
55 | .97 .96 .93 .91 88 ‘86 183 81 78 74 .72 
60 98 07 .05 92 .90 87 85 82 79 75 ЛЗ 
65 99  .98 .96 .91 .92 180 56 83 80 .75 .73 
70 100 .99 .97 .96 .93 191 58 04 180 .76 .73 
75 1.00 100 .98 .97 (5 92 89 gg S1 .76 .73 
.80 100 1.00 .99 .98 97 ‘94 9 87 .82 0.17 .78 
.85 100 1.00 1.00 .99 .98 % оз 80 .84 77 .74 
90 1.00 1.00 1.00 100 (90 ‘98 ‘95 91 .85 .78 .74 
.95 100 1.00 1.00 1.00 100 99 ‘98 94 .86 .78 .74 
1.00 1.00 1.00 1.00 1.00 1.00 100 100 1.00 .88 .78 .74 


If the correlation of the intelligence test and school grades had 
been .60 instead of .50, 87% (ог 174 in 200) of the “best half” 
according to the test would have been satisfactory students; 
and 95% of the “best” 20% on the test should be satisfactory 
Students. "These forecasts are to be compared with 70%, the 
estimate when r = .00. It is clear that a knowledge of the cor- 
relation greatly improves the estimate, and the larger the r the 
better the forecast. 

Table 50 is a small part of a larger table in which “ proportions 
considered satisfactory in achievement” range from .05 to .95.T 

* Taylor, H. C., and Russell, J. Ta; “The Relationships of Validity 
Coefficients to the Practical Effectiveness of Tests in Selection: Discussion 


d Tables,” Journal of Applied Psychology, 23 (1939) 565-578. 
Ut Taylor, Н. C., and Russell, J. Т., ор. сй. , 


£t 


u 


REGRESSION AND PREDICTION 325 


The correlation between test score and performance ranges from 
00 to 1.00. These tables are strictly accurate only when the 
distributions are normal both in the test and in the criteria 
of performance. They may be used with considerable confi- 
dence, however, when the distributions are approximately nor- 
mal, especially when the N’s are large; and in any case they 
furnish useful approximations. 

Forecasting tables have considerable value in selecting person- 
nel for business or other vocations. First, we must determine 
what proportion of a given group of workers is to be considered 
“successful.” With this information in hand and knowing 
the correlation between our test battery and performance in 
the given activity, we may forecast the probable success of 
groups of new applicants from their test scores. Assume, for 
example, that 70% of a group of factory workers are regarded 
as “acceptable workers," acceptability having been determined 
from ratings by foremen, number of pieces done in a given time, 
or time taken to complete certain standard jobs. Assume, fur- 
ther, that a test battery has a correlation of .45 with worker- 
performance, Then if we select the best twenty out of 100 
applicants (“best” according to our tests), we find from Table 
50 that 90% of this number or eighteen should be acceptable 
Workers, If we had had no test and had simply selected the 
Jirst twenty applicants to appear — or any twenty — 70% or 
fourteen should be acceptable Use of the tests improves our 
forecast 3095; and the more stringent the criterion of accepta- 
bility the greater the improvement in forecast made by the 
tests, 


ІШ. Tur Errecr or VARIABILITY OF MEASURES 
UPON THE SIZE OF 7 
Suppose that the correlation between two tests in а small 
group of fifty sixth-grade children has been found to be .50. 
low will this correlation compare with that between the same 
tests in a larger group of greater range, е.5., à group of 200 
Children in the sixth grade or 200 children spread over grades 


326 STATISTICS IN PSYCHOLOGY AND EDUCATION 


six, seven, and eight? More generally, knowing the correlation 


between two tests in a group of narrow range, can we predict 


the probable correlation in a group of wider range? 

The problem of the effect upon т of the “range of talent” 
(size of oz and су) within the group being studied often arises in 
correlational work. It becomes important, for example, when 
one wishes to go beyond the correlation obtained in the sample 
with which he is working and generalize (estimate the r) for a 
group of wider range; or when 778 between the same tests ob- 
tained in different ranges are to be compared. А formula for 
estimating the correlation between two tests in a heterogeneous 
group when we know the correlation between the tests in а 
homogeneous group may be developed in the following way: 
Let Ost. уу be the standard error of estimate in a group some- 
what curtailed in variability or in range of talent; and Gest. үр 
be the standard error of estimate in a larger group less restricted 
in variability. (Y is the dependent variable, p. 314.) "Then, on 
the assumption that our tests are as effective in the wide as in 
the narrow range, Gy) = Fest. уд, Or, by formula (58), 
page 320, 


Tp V1— rhy, = Fy, V1 — Puy 
and 
Ул = (0) 
On т, 


(formula for estimating correlation in a wide range from a 
knowledge of the correlation in a narrow range) 
in which oy, is the standard deviation of Y in the small group, or 
in the curtailed range; Фу is the standard deviation of Y in the 
large group, or in the uncurtailed range; Tz, = the correlation 
in the small group, and zz, = the correlation in the large group- 


To illustrate formula (60), suppose that in a small group 
ди, = 10 and rz, is .50. What would the r between the same 


two tests probably be in a group in which 0,7 15: in which 


REGRESSION AND PREDICTION 327 


, is 50% larger than оу? Substituting бу, = 10, cy, = 15, 
and rz, = .50 in (60), we have 


бу, 


Squaring both sides of this equation and solving, we have 
> q 2 2 
Tay = 82. Ther of .50 in the narrow range becomes ап r of .82 
in the wider range. It is clear from this example that direct com- 
parison of r’s is not valid when the variabilities (078) within the 
groups from which the 775 were computed are quite different. 
If X and not Y is the dependent variable, formula (60) be- 
comes 
— Pu 
4 Е wt (61) 
са У Pay, 
(formula for estimating correlation in a wide range from 
a knowledge of the correlation in a narrow range) 


Formulas (60) and (61) are open to the objection that each takes 
account of only one distribution in estimating the probable 
increase in r with increase in range of talent. If, however, 
the increase in оу as the group becomes more heterogeneous is 
accompanied by a proportional increase in c; (or vice versa), 
formulas (60) and (61) will give accurate estimates. Ехрегі- 
mental trial of these formulas has yielded results closely in 
accord with theoretical expectation.* 


IV. Tur SoLUTION or A SECOND CORRELATION PROBLEM 


The solution of a second correlation problem will be found in 
Figure 54. The purpose of another “model” is to strengthen 
the reader's grasp of correlational techniques by having him 
Work straight through the process of calculating г and the re- 
gression equations upon a new set of data. А student often 
fails to relate the various aspects of a correlational problem 
When these are presented in piecemeal fashion. 


м, * Peters, C. C., and Van Voorhis, W. R., Statistical Procedures and Their 
athematical Bases (1940), pp. 208-212. 


398 STATISTICS IN PSYCHOLOGY AND EDUCATION 


1. Calculation of 7 

Our first problem in Figure 54 is to find the correlation be- 
tween the I.Q.'s achieved by 190 children of the same — or ap- 
proximately the same — chronological age who have taken an 
intelligence examination upon two occasions separated by a 
six months' interval. 'The correlation table has been construeted 
from а scattergram, as described on page 275. The test given 
first is the X-variable, and the test given second is the Y-vari- 
able. The calculation of the two means, and of c+, су, Cz, and су 
covers familiar ground, is given in detail on the chart, and need 
not be repeated here. 

Тһе product-deviations in the Zz'y' column have been taken 
from column 100-104 (column containing the AM x) and from 
row 105-109 (row containing the AM y). Тһе entries іп the 
Ziz'y column have been calculated by the shorter method 
described on page 286; that is, each cell entry in а given row 
has been multiplied first by its x-deviation (x^) and the sum of 
these deviations entered in the column Ух’. The Ух’ entries 
were then “‘weighted”’ once for all by the y’ of the whole row. 
To illustrate, in the first row reading from left to right (1 X 5) 
+ (1 X 6) or 11 is the Èx’ entry. The z^s are 5 and 6, respec- 
tively, and may be read from the z' row at the bottom of the 
correlation table. Since the common y’ is 5, the final Ez'y' 
entry is 55. Again in the seventh row reading down from 
the top of the diagram (5 х – 3) + (3х – 2) + (7x — 1) 
+ (16 x 0) + (2 X 1) + (4 X 2) or 18 makes up the Ут” entry. 
The y' of this row is — 1, and the final Zz'y' entry is 18. To take 
still а third example, in the eleventh row from the top of the 
diagram, (1 X — 5) + (8x — 4) + (1 X - 3) + (2x — 2) or -24 
is the Xx’ entry. Тһе common у’ is — 5 and the Ez'/y' entry is 
120. 

Three checks of the calculations (see p. 283), upon which 7, 
с. апа с, are based, are given in Figure 54. Note that ХУ 
= Ўт’; and that, when the Zx^y^s are recalculated, at the bot- 
tom of the chart, Хуу’ = Ху’, and the two determinations of 
®х'у' are equal. When the 227у”?5 have been checked, the cal- 


Е”: 
—щ——— 


329 


REGRESSION AND PREDICTION 


'uorvururgx5;p эочэЯеэзат [unprArpug uv јо шло OMT, поп 
"ұғ eureg oq, jo чазричо 06T Aq peAerqoy в, O'I oY} uo94]9q иоцъјәллоџ) IY} jo потүгтор ) “Pg DIA 


ror+Ats = X 

(1501 — A)£8' = 9101 — X 
ose + X609 =A 

(9101 — Х)69°= 2501 — A 


cyst 
Пе f —— Lar 
айы Тс 
TER 183! ай 
209 Tozi Xl 
Биол] 101559185} 
97 = 
$95 X lr EN 
(607 хоз —)—TRN 
set = 
S X ET = 
* жа n6r = 70 
5 X $00 ПП, 
coc = 
сх = 
sx — SN a 
AKAT 


sos = 
92° TASTEL = Ag 


£85 = 
192" — TASOT = Gm 


ojvums JO 6,79 


105 15 — 058 08— 
48 66— 081 09 — 
19 ot -— cit 96 — 
SI 8І1-- 26 4%- 
м v =“ фм 
05 02 el £L 
SZ 6E cS 95 
col ср 501 9$ 
821 ct ScI ct 
сс II 05 от 
дих әш ый Ш 


69101 = Sp — 201 = Yur 2201 = Of F — 201 = АЖ 
ср — =СХ 60°— =: 05$ — =$Х93'— = 
$00' = 72 pL = 4,2 
Qu — = 91 а og — = GEL 0 


SL Ұ £ O Gh FO FII 095 SL AI 
95 6 © Ob—cb—lP—St—S99—S91— 
801 2/1 OOL 901 $8 OI $5 96 COL 885 €. 
(S61) SI SE OF Sh № 
9 8 Ғ % 0 
oot £ 2 OL FL 55 


ОГ (г.552-8Р-%С-2,--<01-- 2) 
то 
91 + 


I-t~t-t-G- 2 
$2 # 81 SI 6 *f 
6L-SL 
%8-08 
63-53 
76-06 
06-96 
#01-001 
601—601 
FII-OIL є 
6II-SII 
FCI-0CI 
6cI-ScI 
F£I-0£I 


ES 


Ж 


Pá 
6 24<|8 9 5 


) 1891, puooog 


7] ТЕТ CoE FZL GIL FIT GOL FOL 66 #6 68 FS 62 
:021-©51-061-511-011-©01-001 -26 -06 -SS -0S -S2 


(X) 152.1, 2541 


330 STATISTICS IN PSYCHOLOGY AND EDUCATION 


culation of r by formula (44) is a matter of substitution. Note 
carefully that cz, су, Cz, бу are all left in units of class-interval 
in the formula for r (p. 288). 


2. Calculation of the Regression Equations and the SE's of 
Estimate 

Тһе regression equations in deviation form are given on the 
chart and the two lines which these equations represent have 
been plotted on the diagram. Note that these equations may 
be plotted as they stand, since the class-interval is the same for 
X and for Y (p. 316). In the routine solution of a correlational 
problem it is not strictly necessary to plot the regression lines on 
the chart. These lines are often of value, however, in indicating 
whether the means of the X- and Y-arrays can be represented 
by straight lines, that is, whether regression is "linear." If the 
relationship between X and Y is not linear, other methods of 
caleulating the correlation must be employed (p. 365). 

'The standard errors of estimate, shown in Figure 54, are 
7.83 and 8.55 depending upon whether the prediction is of Y 
from X or X from У. All I.Q.'s predicted on the F-test from X 
may be considered to have the same error of estimate,* and 
Similarly for all predictions of X from Y. 

Errors of estimate are most often used to give the reliability 
of specific predicted measures. But they also have a more 
general interpretation. Thus а C (cs. y) Of 7.83 points means that 
two-thirds of the I.Q.’s in test У missed perfect correspondence 
with the I.Q.’s in test X by + 7.83 points or less, while the other 
one-third missed complete agreement by more than + 7.83 
points. Stated differently, we may say that 68% of the I.Q.'s 
predicted on test Y from test X may be expected to differ from 
their actual values by not more than + 7.83 points, while the 
remaining 32% may be expected to differ from their actual 
values by more than + 7.83 points. 

* See, however, Terman, L. M., and Merrill, M. A., Measuring Intelli- 


gence (1937), pp. 44-47, where the SE’s of estimate have been computed 
for various IQ levels. 


REGRESSION AND PREDICTION 331 


= 8. The “Regression Effect" in Prediction 


Predicted scores tend to “move in” toward the mean of the 
distribution into which prediction is made (p. 311). This so- 
called regression effect has often been noted by investigators 
and is always present when correlation is less than + 1.00.* 
The regression phenomenon can be clearly seen i in the following 
illustrations: From the regression equation Y = .69.X + 32.6 
(Fig. 54) it is clear that а child who earns an I.Q. of 130 on the 
first test (X) will most probably earn an I.Q. of 122 on the second 
test (Y); while a child who earns an 1.0. of 120 in X will most 
probably score 115 in Y. In both of these illustrations the pre- 
dicted Y-test 1.0. is smaller than the first or X-test 1.0. Put 
differently, the second I.Q. Ваз regressed or moved down toward 
the mean of test Y, i.e., toward 102.7. The same effect, occurs 
when the I.Q. on the X-test is below its mean; the tendency now 
is for the predicted score in Y to move ир toward its mean. 
Again, from the equation Y = .69X + 32.6, we find that if a 
child earns an 1.0. of 70 on the X-test his most likely score on the 
second test (Y) is81; while an L.Q. of 80 on the first test forecasts 
an LQ. of 88 on the second. Both of these predicted 1.Q.’s have 
moved up nearer to the mean 102.7 (Mz). . 

Тһе tendency for all scores predicted from a regression equa- 
Чоп to pull in — down or up — toward the mean, can best be 
Seen as a general phenomenon if the regression equation is writ- 
ten in standard-score form. Given 


„© yx (54) p. 312 


if we divide both sides of this equation by c, and write c, under 


$, we have 
T apt or d,— fs (62) 
бу Oz 

res in X and У are expressed 


(regression equation when sco 
'ard-scores) 


as z or stand 
* Thorndike, В. L., “Regression Fallacies in the Matched Groups Ex- 
. ‚ В. Lo i 
Periment,” Psychometrika, 7 (1942), 85-102. 


332 STATISTICS IN PSYCHOLOGY AND EDUCATION 


In the problem in Figure 54, 2, = .762.. If z, is + 1.000, 2 


ог + 2.000, or + 3.009 from M., 2, will be + -46c, + 1.520, 
or + 2.280 from M,. That is to say, any score above or below 
the mean of X forecasts a Y-score somewhat closer to the mean 
Y. 
Е In studying the relation of height in parent and offspring, 
Galton (p. 311) interpreted the phenomenon of regression to the 
mean to be a provision of nature designed to protect the race 
from extremes This same effect occurs, however, in any corre- 
lation table in which r is less than + 1.00, and need not be ex- 
plained in biological terms. The I.Q.’s of a group of very bright 
children, for instance, will tend upon retest to move downward 
toward 100, the mean of the group; while the I.Q.’s of a group 


of dull children will tend upon retest to move upward toward 
100. 


V. THE INTERPRETATION OF THE COEFFICIENT 
OF CORRELATION 

When should a coefficient. of correlation be called “high,” 
when “medium,” and when "low"? Does an r of .40 between 
two tests indicate “marked” or “low” relationship? How high 
should an r be in order to permit accurate prediction from one 
variable to another? Can ап r of -50, say, be interpreted with 
respect to “overlap” of determining factors in the two variables 
correlated? Questions like these, all of which are concerned 
with the significance or meaning of the relationship expressed 
by a correlation coefficient constantly arise in problems involv- 
ing mental measurement, and their implications must be under- 
stood before we can effectively employ the correlational method. 

Тһе value of r as a measure of correspondence may be prof- 
itably considered from two points of view.* In the first place, 
78 are computed in order to determine Whether there is any 
correlation (over and above chance) between two variables; 
and in the second place, 7’s are computed in order to determine 


* Barr, A. S., “The Coefficient of Correlation," Journal of Educational 
Research, 23 (1931), 55-60. 


4f 


REGRESSION AND PREDICTION 333 


the degree or closeness of relationship when some association is 
known, or is assumed, to exist. The question, “15 there any 
correlation between brain weight and intelligence? ””, voices the 
first objective. And the question, “ How significant is the corre- 
lation between high-school grades and first-year performance in 
college?" expresses the second. The problem of when an ob- 
tained r denotes significant relationship has already been con- 
sidered on page 297. This section is concerned mainly with the 
second problem, namely, the evaluation — with respect to de- 
gree of relationship — of an obtained coefficient. The questions 
at the beginning of the paragraph above all bear upon this topic. 


1. The Interpretation of r in Terms of Verbal Description 

It is customary in mental measurement to describe the corre- 
lation between two tests in a general way as high, marked or 
substantial, low or negligible. While the descriptive label ap- 
plied will vary somewhat in meaning with the author using it, 
there is fairly good agreement among workers with psychological 
and educational tests that an 
“from 00% 4: .20 denotes indifferent or negligible relationship; 
7 from Æ .20 to + .40 denotes low correlation; present but slight; 
т from + 40 to + .70 denotes substantial or marked relationship; 
T from + .70 to + 1.00 denotes high to very high relationship. 


This classification is broad and somewhat tentative, and can 
Only be accepted as a general guide with certain reservations. 
Thus a coefficient of correlation must always be judged with re- 
gard to 


(1) the nature of variables with which we are dealing; 

(2) the significance of the ccefficient; 

(3) the size and variability of the group (p. 325); 

(4) the reliability ccefficients of the tests used (p. 380); 

(5) the purpose for which the r was computed. 

To consider, first, the matter of the variables being correlated, 
an г of .30 between height and intelligence, or between head 
Measurements and mechanical ability would be regarded as 


334 STATISTICS IN PSYCHOLOGY AND EDUCATION 


important although it is rather low, since correlations between 


physical and mental functions are usually much lower — often 7 


zero. On the other hand, the correlation must be .70 or more 
between measures of general intelligence and school grades or 
between achievement in English and in history to be considered 
high, since 78 in this field usually run from .40 to .60. Re- 
semblances of parents and offspring with respect to physical 
and mental traits are expressed by 778 of .35 to .55; and, accord- 
ingly, an r of .60 would be high.* By contrast, the reliability of a 
standard intelligence test is ordinarily much higher than .60, 
and the self-correlation of such a test must be .85 to .95 to be 
regarded as high. In the field of vocational testing, the »'s be- 
tween test batteries and measures of aptitude represented by 
various criteria rarely rise above .50 1; and 7's above this figure 
would be considered exceptionally promising. 

Correlation coefficients must be evaluated also with due re- 
gard to the reliabilities (p. 380) of the two tests concerned. Ве- 
cause of chance errors, an obtained r is always less than its 
“corrected” value (p. 396) and hence, in a sense, is a minimum 
measure of the relationship present. "Тһе effect upon an of the 
size and variability of the group is discussed elsewhere (p. 325), 
and a formula for estimating such effect provided. The purpose 
for which the correlation has been computed is important. The 
r which is to be employed in predicting the scores of individuals 
from one test to another, for instance, should be much higher 
than the 7, the purpose of which is to provide forecasts of the 
achievement of selected groups (p. 322). 

In summary, a correlation coefficient is always to be judged 
with reference to the circumstances under which it was obtained. 
There is no such thing as the correlation between mechanical 
aptitude and abstract intelligence, for instance, but only @ 
correlation between certain tests of mechanical aptitude and 
intelligence, given to certain groups under definite conditions: 


* Jones, H. E., A First Study of Parent-Child Resemblance in I ntelligences 
27th Yearbook of the N.S.S.E. (1928), Part I, 61-72. 

7 Stead, W. H., and Shartle, C. L., Occupational Counseling Techniques 
(1940), Chapters 7 and 8. 


— АА ы 


= 


AN 


REGRESSION AND PREDICTION 335 


Correlation coefficients are always to be thought of as condi- 
tional and never as absolute indices of relationship. 


2. The Interpretation of r in Terms of Ges) and the Co- 
efficient of Alienation 

One of the most practical ways of evaluating the effectiveness 
of a coefficient of correlation is through the standard error of 

We have found (р. 320) that Gy) — which 
equals суУ1 — E — enables us to tell how accurately we can 
estimate (by means of the regression equation) an individual's 
Score in Test У when we know his score in Test X. 'The size of 
Test, уу depends directly upon бу and upon the correlation be- 
tween the two tests. When r = 1.00, Gest. у) = .00, and we can 
predict а person's score in Y, knowing his score in X, with 100% 
accuracy — no error. On the other hand, when r= .00, 
бом у) = су, and we can only be certain that the predicted 
score lies somewhere within the limits of the Y-distribution, ie., 
Within the limits Mean Score + 30,. In other words, when 
T = .00 our estimate of a person's Y-score is not aided at all by a 
knowledge of his score in Х. As т decreases from 1.00 to .00, 
the standard error of estimate increases so markedly that pre- 
dictions from the regression equation range all the way from 
certainty to what is virtually a "guess." * The significance of 
an 7, with respect to predictive value, therefore, may be accu- 
rately gauged by the extent to which r improves our prediction 
Over a “mere guess." 

The following problem will serve as an illustration: Suppose 
that the correlation between two tests Y and X is .60, and that 
T, = 5.00. Then Gest. у) 15 5 Х М1 .60° or 4.00. This SZ is 
20% less than 5.00, the без. у) when r = .00, i.e., when Gs y) 
has minimum predictive value. The amount of reduction in 


estimate, Fs 


The term “guess” as here used does not imply an estimate which is 


based upon no information whatsoever — à shot in the dark, so to speak. 
When r = .00, the most probable Y-score predicted for every individual 
in the X-distribution is My, and бей. у) = бу» Hence, our Y-estimates 
are “guesses” in the sense that they may lie anywhere in the Y-distribu- 


tion — but not anywhere at all! 


336 STATISTICS IN PSYCHOLOGY AND EDUCATION 


безе уу as r varies from .00 to 1.00 is given by the expression 
vI = т, and hence it is possible from М1 — 7? alone to gauge the 
predictive value of an r. The expression V1 — 7? is often called 
the coefficient of alienation and is denoted by the letter k. The 
coefficient of alienation may be thought of as measuring the 
absence of relationship between two variables X and Y in the 
same sense in which r measures the presence of relationship. 
When k = 1.00, r = .00, and when k = .00, r = 1.00: the larger 
the coefficient of alienation the smaller the degree of relation- 
ship, and the less precise the prediction from X to Y. In order 
to show how the estimate improves as г increases, the A's for 
certain values of r from .00 to 1.00 are tabulated in Table 51. 


TABLE 51 
COEFFICIENTS OF ALIENATION (k) гов VALUES OF r FROM .00 то 1.00 


r k= УГ 


r k= УТ= 

.0000 1.0000 -8000 .6000 
.1000 -9950 (.8660) (.5000) 
-2000 .9798 -9000 4359 
.3000 9539 9500 3122 
4000 9165 9800 1990 
5000 -8660 -9900 1411 
6000 -8000 1.0000 .0000 
:7000 7141 

(7071) (7071) 


Note that r must be .866 before & lies halfway between 1.00 
and .00, before the standard error of estimate is reduced to 
one-half of its value where r = .00. For r’s of .80 or less, the 
coefficients of alienation are clearly so large that predictions of 
individual scores based upon the regression equation are little 
better than “guesses.” * Even when r = -99, the standard error 
of estimate is still 1/7 as large as when r = 00. In contrast to 
actuarial prediction, therefore, the estimation of an individual's 
score in one test from another is not warranted unless r is at 
least .90. 

The coefficient E given by the formula below is often useful in 


* An r is more efficient in forecasting the probable success of a group 
(see p. 322). 


REGRESSION AND PREDICTION 337 


providing a quick estimate of the predictive efficiency of an ob- 
tained r. E, which is called the "coefficient of forecasting 
efficiency” or the coefficient of dependability, is derived from 


k as follows: 
Е=1- тп (63) 


E=1-k 
(‘coefficient of forecasting efficiency” or coefficient 
of dependability) * 


or 


To illustrate the application of Z, suppose that the correlation 
of a test (or of a test battery) with some criterion of performance 
is .50. From formula (63) Е = 1 — .87 or .13; and the test’s 
efficiency in predicting criterion scores may be said to be 13%. 
When 7 = .90, E = .56 and the test is 56% efficient; when 
r= .98, Е = .80 and the test is 80% efficient, and so оп. Ob- 
viously, the correlation must be above .87 for the test's fore- 
casting efficiency to be greater than 50%. 

E gives essentially the same information as Gest. y) Or k. 
Thus, if r = .50, k = 87 and Gest, уу is 87% of бу, which is its 
value when т = .00. Accordingly, an r of .50 reduces the 
бом. уу by 13%. 


3. The Interpretation of r in Terms of the Coefficient of De- 
termination (r?) 

The interpretation of r in terms of “overlapping” factors in 
the tests being correlated may be generalized through an 
analysis of the variance (6?) of the dependent variable — usually 
the Y test, In studying the variability among individuals 
Upon a given test, the variance of the test scores is often a more 
Useful measure of “spread” than is the standard deviation. 
Тһе Object in analyzing the variance of Test Y is to determine 

* See Conrad, H. S., and Martin, С. B., “Тһе Index of Forecasting 


Efficiency, for the Case of a ‘True’ Criterion,” Journal of Experimental 


Education, 4 (1935), 231-244. : 
Ezekiel, M., Methods of Correlation Analysis (2nd ed., 1941), p. 139 


and рр. 2112919 


338 STATISTICS IN PSYCHOLOGY AND EDUCATION 


from the correlation between Y and X what part of Test Y's 
variance is associated with, or dependent upon, the variance of 
Test X, and what part is determined by the variance of factors 
not in Test X. 

If we have calculated the correlation between Tests Y and X, 
c?, gives a measure of the total variance of the Y-scores; and 
GO? est. у), Which equals o*,(1 — 754), gives a measure of the vari- 
ance left in Test Y when that part of the variance produced by 
Test X is ruled out or held constant.* To illustrate, if we have 
the correlation between height and weight in a group of school 
children, о?и will be reduced to ost. му, when the variance 
in weight is zero, i.e., when all of the children have the same 
weight. If 0%, уу is subtracted from 6?, there remains that 
part of the variance of Test Y which 7 associated with Test X; 
and if this value is divided by o%,, we obtain that fraction of 
the variance of Test Y attributable to or associated with Test X- 
Carrying out the operations described, we have 


ИНЕГ. 2 
Ou Ое. y) _ 075, — 05, + с, қ 
s ag 


z 7 
с", o, zy 


from which it is clear that 7*., gives the proportion of the vari- 
ance of Test Y which is associated with Test X. When used in 
this way, 7° is called the coefficient of determination. If the corre- 
lation between Tests Y and X is 707, 7° is .50. Hence, an T of 
:707 means that 50% of the variance of Test Y is associated with 
the variability in Test X. Since т + Л = 1.00,t the proportion 
of the variance in Test Y which is not associated with Test X is 
given by k*. In the present case, since 7? is .50, k? is also .50. 
Тһе coefficient of determination tells us what part of the vari- 
ance of Test Y is determined by Test X. But alone gives u$ 
no information as to the character of the association and We 
cannot assume a causal relationship unless we have evidence 
beyond the correlation. Inspection of the squares of small co- 
efficients of correlation emphasizes the slight degree of associa- 


* See Chapter XIII for further discussi i i 
1 See Table B. ion of this topic. 


ч 


REGRESSION AND PREDICTION 339 


tion, in terms of related changes in variability, indicated by low 
T's. Anrof.10, for example, or .20, or .30, between Tests X and 
Y, indieates that only 1%, 4%, and 9%, respectively, of the 
variance of Y is associated with X. On the other hand, when r 
is .95, about 90% (7° = .90) of the variance of Test Y is associ- 
ated with Test X, only 1095 being unrelated. Valuable insight 
into the part played by one or more variables in determining 
the total variance of a criterion may be obtained through the 
coefficient of determination. 


4. Summary 
It may be helpful to summarize the main poinis brought out 
in this section. 

(1). Whether an obtained 7 is to be regarded as “high,” **me- 
dium," or “low” will depend upon the variables being 
studied, the reliability coefficients of the two tests, the size 
of the group and its variability, and the purpose for which 
the r is being computed. Correlation coefficients are never 
absolute indices of relationship. | 

(2) Тһе accuracy with which an r enables us to predict (through. 
the regression equation) individual scores in Test Y from 
given scores in Test X may be determined from Cs. уу, 
from E, and from k, the coefficient of alienation. 

(3) тһе coefficient of determination provides a method of detare 
mining what proportion of the total variance (о% о Test Y 
is associated with Test X; and what proportion is independ- 
ent of Test X. This method of analysis may be extended 
to problems employing partial and multiple correlation 


(p. 425).* 


» с 


PROBLEMS 
1. Write out the regression equations in score form for the correla- 
tion table in example 3, page 303. 


(а) Compute түе. уу and безе ©. | . 
(b) What is iiec probable height of a boy who weighs 30 


* Wright, Sewall, ‘Correlation and Causation, 
Research, 20 (1921), 557-585. 


” Journal of Agricultural 


340 


2. 


STATISTICS IN PSYCHOLOGY AND EDUCATION 


pounds? 45 pounds? What is the most probable weight of a 

boy who is 36 inches tall? 40 inches tall? 
In example 4, page 305, find the most probable grade made by a 
child whose score on Army Alpha is 120. What is the cest) of 
this grade? | 
What is the most probable algebra grade of a child whose 1.0. is 
100 (data from example 5, р. 306)? What is the cest) of this 
grade? 


. Given the following data for two tests: 


History (X) English (Y) 

Mx = 15.00 Му = 70.00 

oz = 6.00 о, = 8.00 
Try = .72 


(a) Work out the regression equations in score form. 

(b) Predict the probable grade in English of a student whose his- 
tory mark is 65. Find the cst of this prediction. 

(с) If rz, had been .84 (c's and means remaining the same) how 
much would giest. уу be reduced? 


- The correlation of a test battery with worker efficiency in a large 


factory is .40, and 70% of the workers are regarded as “satis- 
factory.” 


(a) From seventy-five applicants you select the “best” twenty- 


five іп terms of test score. How many of these should be satis- 
factory workers? 


(b) How many of the best ten should be satisfactory? 


(с) How many in the two groups should be satisfactory if selected 
at random, i.e., without using the test battery? 


- Plot the regression lines in on the correlation diagram given 


in example 5, page 306. Calculate the means of the Y-arrays 
(suecessive Y-columns), plot as points on the diagram, and join 
these points with straight lines. Plot, also, the means of the X- 
arrays and join them with straight lines. Compare these two 
"]ines-through-means" with the two fitted regression lines (see 
Fig. 52, p. 310). 


. In a group of 115 freshmen, the r between reaction time to light 


and substitution learning is .30. Тһе с of the reaction times is 
20 ms. What would you estimate the correlation between these 


10. 


11. 


REGRESSION AND PREDICTION 341 


two tests to be in a group in which the o of the reaction times is 
95 $? 
25 ms? 


. Show the regression effect in example 4, page 305, by calculating 


the regression equation in standard-score form. For I.Q.'s + 1.009 
and + 2.00e from the mean I. Q., find the corresponding school 
marks in standard-score form. 


. Basing your answer upon your experience and general knowledge 


of psychology, decide whether the correlation between the follow- 
ing pairs of variables is most probably (1) positive or negative; 
(2) high, medium, or low. 

(a) Intelligence of husbands and wives. 

(b) Brain weight and intelligence. 

(c) High-sehool grades in history and physics. 

(d) Age and radicalism. 

(c) Extroversion and college grades. 

How much more will an r of .80 reduce a given Gest) than an т of 
407 An r of .90 than an r of .40? 

(a) Determine k and Е for the following 778: 35; — .50; .70; .95. 
Interpret your results. 


(b) What is the “forecasting efficiency” of an r of 45? an r of .99? 


- The correlation of a criterion with a test battery is .75. What 


percent of the variance cf the criterion is associated with vari- 


ability in the battery? What percent is independent of the battery? 


ANSWERS 


.Y = 40X + 24.12; X = 126Y — 11.52 


(а) ceu. уу = 1.78; бен. х) = 8.16 
(b) 36.12 inches; 42.12 inches; 33.84 pounds; 38.88 pounds 


2 85.2; бен. y) = 7.0 = . 
- X = 37ү 4 816. When Y(I.Q) is 100, X (algebra) is 45.2 


без, ху = 6.8 


4. (а) Y = 96x —2; X = .54Y + 37.2 
(b) 60.4; cq. y) = 5.5 
(c) 22% 
5. (a) 21 7. т = 65 
(b) 9 8. 2.46 and + .92 


(c) 17.5 and 7 (.е., 70%) 


342 STATISTICS IN PSYCHOLOGY AND EDUCATION 


10. Five times as much; seven times as much. 


11. (a) T k E 
-35 94 06 

— .50 87 13 

70 71 А .29 

95 31 69 


(b) 11%; 86% 
12. 56%; 44% 


CHAPTER XI 


FURTHER METHODS OF CORRELATION 


In Chapters ІХ and X, we described the linear, or product- 
moment correlation methods, and showed how, by means of 7 
and the regression equations, one can “predict” or “forecast” 
values of one variable from a knowledge of the other. 'The 
linear correlation coefficient is useful in psychology and educa- 
tion as а measure, primarily, of the relationship between test 
scores and other determinations of performance. Test scores 
а series of measurements of а con- 
tinuous variable taken along a numerical scale. Many situa- 
tions arise, however, in which the investigator does not have 
scores and must work with data in which differences in merit or 
capacity can be expressed only by ranks (e.g., in orders of merit); 
or by classifying an individual into one of several descriptive 
categories. This is especially true in vocational and applied 
psychology and in the field of personality and character measure- 
ment. Again, there are problems in which the relationship 
among the measurements made is non-lincar, and cannot be 
described by the product-moment т. In such cases other meth- 


ods of determining correlation must be employed; and the 


purpose of this chapter is to develop some of the more useful of 


these techniques. 


(as we have seen) represent 


I. Сомргтіха CORRELATION FROM Ranks 


Differences among individuals in many traits can often be 
expressed by ranking the subjects in one-two-three order when 
such differences cannot be measured directly. Persons, for 
example, may be ranked in order of merit for honesty, athletic 

- ability, salesmanship, or social adjustment when it is impossible 
to measure these complex behaviors. In like manner, various 
products or specimens such as advertisements, color combina- 

343 


344 STATISTICS IN PSYCHOLOGY AND EDUCATION 


tions, handwriting, compositions, jokes, and pictures which are 
admittedly hard to measure may be put in order of merit for 
esthetic quality, beauty, humor, or some other characteristic. 
In computing the correlation between two series of ranks, special 
methods which take account of relative position have been de- 
vised. "These methods may also be applied to scores which have 
been arranged in order of merit. Although our scores represent 
quantitative determinations on a metric scale, when we have 
only a few (less than twenty-five for example), it is often ad- 
visable to rank them in order of merit and compute the correla- 
tion by the rank-difference method instead of by the longer and 
more laborious product-moment method. Coefficients of corre- 
lation calculated from a few cases are not very reliable at best, 
and their chief value lies in suggesting the possible existence 
of relationship, as in а preliminary survey. In such situations 
the rank-difference method will probably give as adequate а 


result as that obtained by a more refined technique, and is much 
easier to apply. 


1. Calculation of p (rho) by the Method of Rank-Difference 
The method of rank-difference is illustrated in Table 52. The 
problem is to find the relationship between the length of service 
and the selling-efficiency of twelve salesmen. The names of the 
men (A, B, C, ete.) are listed in column (1) of the table, and in 
‚ column (2), opposite the name of each man, is given the number 
of years he has been in the service of the company. Іп column 
(3), the men are ranked in order of merit in accordance with 
their length of service. For example G, who has been longest 
with the company, is ranked 1 ; C, whose length of service is 
next longest, is ranked 2 ; and so on down the list. Note that 
both A and J have the same period of service, and that each is 
ranked 7.5. Instead of ranking the first man 7 and the second 
man 8, or both 7 or both 8, we compromise by ranking both 
7.5 and F, who follows, 9.* 


* If three men receive the same rank, e.g., 7, 8, 9, each is ranked 8 and 9 
next man in order is ranked 10. If four men receive the same rank, e.g.) 7, 
8, 9, and 10, each is ranked 8.5 and the next in order 11 


* 


== 


= 


FURTHER METHODS OF CORRELATION 345 


TABLE 52 


To ILLUSTRATE THE Raxk-DirrERENCE METHOD OF 
MEASURING CORRELATION 


а) (2) (8) (4) (5) (6) 
Difference ; 
А Order of Order of x Difference 
Salesmen Tuus of Merit Merit hotwpen Squared 
CY? (Service) (Efficiency) (D) (2) 
А 5 7.5 6 1.5 2.25 
В 2 11.5 12 5 .25 
C 10 2 1 1.0 1.00 
D 8 4 9 5.0 25.00 
E 6 6 8 2.0 4.00 
Е 4 9 5 4.0 16.00 
с 12 1 2 1.0 1.00 
H 2 11.5 10 1.5 2.25 
I Г 5 3 2.0 4.00 
J 5 75 7 5 .25 
к 9 3 4 1.0 1.00 
L 3 10 11 1.0 1.00 
М = 12 А 58.00 
„= ЕПЗ, qo (64) 


= т > 190148) 


In column (4) the men have been ranked by the sales manager 
in order of merit for efficiency as salesmen: С, the most efficient 
man, is ranked 1; and В, the least efficient, is ranked 12. In 
column (5):the difference (designated D) between each man's 
efficiency rank and his years-of-service yank is entered; and in 
the last column each of these D's has been squared. Since each 
D is squared in column (6), no account need be taken of + and — 
Signs in column (5). Тһе correlation between the two orders of 
merit may now be 'computed by substituting for 2D? and N in 
the formula . 

62D? 
р=1- ўт) (64) 


(rank correlation coefficient, p) 


in Which D represents the difference in rank of an individual 
In the two series; ED? is the sum of the squares of all such 
differences; and М is the number of cases. Substituting 58 for 
the ED? and 12 for М in formula (64), we obtain a р of .80. The 


s 


346 STATISTICS IN PSYCHOLOGY AND EDUCATION 


symbol p (read as rho) is the rank order coefficient of correlation. 


p may be transmuted into a product-moment r by means of < 


tables, but the difference between р and its equivalent r is so 
small that with little loss of accuracy p may be taken as equal 
directly to r. 


2. The Significance of p (rho) 
If p is small and М reasonably large (thirty or more) the 5/7 
of p can be determined by the following formula: 
1.05(1 — p?) 
gum M D 
* N-i] 


(standard. error of p, rank-order coefficient of correlation) 


(05) * 


Whenever N is small, the SE of p is likely to be larger than 
the value given by the formula, as the sampling distribution of p 
is not normal (р. 297). For this reason, a р computed from less 
than thirty cases must always be interpreted with caution. A 
better method of determining significance, especially when p is 
large, is to test the obtained p against the null hypothesis, that 
is, to use Table 49, page 299. For example, we find that for 
N — 2 or 10 degrees of freedom (Table 49), an г must be .71 to 
be significant at the .01 level. Since our p (or т) of .80 is con- 
siderably larger than this value, it is clearly very significant 
although N is small. 

If a caleulated p is 40, say, and М is 28, the SE, by (65), is 
.16. As pis 2.5 times its SE, from Table 17 it is almost signif- 
icant at the .01 level and clearly significant at the .05 level. А 
better test of significance (which does not assume normality of 
the sampling distribution) is to compute £ by formula (53), 


A0V 26 
Ут AO: 2.22. From Table 29, we note 


page 298, viz., t = 


that when N — 2 = 26, t is 2.06 at P = .05 and 2.78 at P = .01. 
Hence, p is significant at the .05 level, but is not significant at 
-7063(1 — p? 

* PE, = TEU 


(65a) 
М-І 


| 


FURTHER METHODS OF CORRELATION 347 


the .01 level. This same result сап be obtained directly from 
Table 49. We find, for instance, for N — 2 = 26 that an r must 
be .37 to be significant at the .05 level, and .48 to be significant 
at the .01 level. 


3. Summary of the Rank-Difference Method “ 

The product-moment method takes into account the size of 
the score as well as its position in the series. The rank-difference 
method takes account only of the positions of the items іп the 
Series. No allowance is made for size of gaps between adjacent 
Scores. Individuals, for example, who score 90, 89, and 70 on a 
given test are ranked 1, 2, 3 in order of merit, although the 
difference between 90 and 89 is 1, and the difference between 
89 and 70 is 19. Considerable accuracy may be lost in trans- 
lating scores over into ranks, as gaps will appear in the rankings 
When a number of scores, all of the same size, receive the same 
rating. The rank-difference method is rarely used with test 
Scores when М is larger than thirty and is often an exploratory 
device. 


IL MEASURING CORRELATION FROM Dara GROUPED 
INTO CATEGORIES 


1. Bi-serial Correlation 

In many problems it becomes important to calculate the corre- 
lation between traits or attributes, when the members of the 
group ean be measured (1.е., given scores) in the first variable, 
but can only be classified into two categories in the second or 
“dichotomous” variable. (The term dichotomous means “cut 
into two parts.") We may, for instance, wish to know the corre- 
lation between MA and “social adjustment” in a group of 
Dursery-sehool children, when our subjects have been given 
Scores in the first trait, but are simply classified as “socially 
adjusted” or “not socially adjusted” in the second trait. Other 
examples of dichotomous classification with reference to some 
attribute are athlctic-non-athletic, Negro-White, radical-con- 


2948 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Servative, socially minded-mechanically minded, literate-illit- 


erate, above eighth grade in school-below eighth grade, and the * 


like. Many test and questionnaire items also are scored so as 
to give responses which fall into two categories; as, for example, 
problems marked Passed or Failed, statements marked True or 
False, personality inventory items answered Yes or N. о, interest 
test items marked Liked or Disliked, and so on. Тһе correla- 
tion between a set of scores and a two-category classification 
(like those listed above) cannot be found by the ordinary pro- 
duct-moment formula or by the rank-difference method. How- 
ever, if we can assume that the attribute for which we have made 


that classification could be made in finer units, the correlation 
between such a trait and a set of scores may be computed by 


(1) Calculation of Bi-serial r 


Th 
problem is to find the correlation between total scores on a test 


test tend to answer Item 72 “Yes” more often than “Мо.” The 
first column of Table 53 gives the class-intervals of the score 
distribution, Column two gives the distribution of Scores made 
by the Sixty subjects who answered “Yes” to Item 72, and 
column three the distribution of Scores made by the forty sub- 
jects who answered “Хо,” The sum of all of the frequencies 
on the score-intervals gives the total distribution of 100 cases 
(column four). Тһе Steps in calculating bi-serial r from here on 
are as follows: 

Step 1 


Calculate M,, the mean of the scores made by the sixty sub- 
jects who answered “Yes” to Item 72. Also calculate M,, the 
mean of the scores made by the forty subjects who answered 
“No” to Item 72. In our problem, M, = 60.08, and M, = 55.00. 


JT 


—$ —— 


FURTHER METHODS OF CORRELATION 349 


TABLE 53 
To ILLUSTRATE THE CALCULATION OF THE BI-SERIAL Р BETWEEN 
Toran Scores ON A TEST AND THE ANSWERS TO A 
SINGLE [ТЕМ ON THE TEST 


Scores Responses to 
on Test уу. Item 72 — ae 
"Mes “No у М = 58.00; mean of all scores 
T (№ = 100) 
20-54 3 3 с = 11.63; с of all scores (№ = 100) 
9-79 4 2 6 М» = 60.08; mean of “Yes” responses 
70-74 6 2 8 (№ = 60) 
65-69 5 5 10 M, = 55.00; mean of “Хо” responses 
60-64 10 9 19 (М = 40) 
55-59 10 B 15 A ing "Yes? 
50-54 15 5 20 р = .60; proportion answering “Yes 
45-49 4 3 7 to Item 72 . 
40-44 3 2 5 q = .40; proportion answering “Хо”? 
35-39 4 4 to Item 72 
30-34 2 2 2 = .386; height of ordinate separat- 
25-29 Lh ad ing 60% from 40% in a nor- 
60 40 100 mal distribution (Table 54), 
(p) (0 (% m А ) 
_M,— M. à EN bis) 
"bis = ^x . м (66) Озы a (00 
= 60.08 — 55.00 " (.60) (.40) м1 еә 
ПЕСЕН 386 \ = (27) 
Б 222-3100 
= .12 
Step 2 


Calculate the с of the whole distribution — the distribution 
9f the 100 Scores. This с, which equals 11.63, gives the spread 
9f the test scores in the entire group. 


Step 8 

Sixty percent of the group (p) answered “Yes” to Item 72, 
and 40% (0) answered “Мо” (p always equals 1 — 4). Assum- 
mg à normal distribution of opinion on this item (varying from 
Complete agreement on through indifference to complete dis- 
agreement) upon which a dichotomous division has been forced, 
We place the dividing line between the “Yes” and “Хо” groups 
at a distance of 10% from the middle of the curve, as shown in 


the figure below. 


350 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Fic. 55. 


From Table 54, the height of the ordinate (1.е., z) which is 10% 
from the mean of a normal distribution is .386. 
Step 4 

Having computed M,, Ma, c, p, q, and z, we find ты, from 
the formula 

M,- М, 
Тыз = ————— 

c 

(bi-serial coefficient of correlation or bi-serial т) 


in which, as illustrated by the problem above, and shown in 
Table 53 


M, = mean of the group in the first category (usually the group 
showing superior or more desirable characteristics) 
M, — mean of the group in the second category 
с = standard deviation of the entire group 
p — proportion of the whole group in category one 
q — proportion of the whole group in eategory two (р-1- Ф 
2 = height of the ordinate in the normal curve dividing Р 
from q 


x Pd (66) 
2 


In Table 53, ты, is .27, indicating a tendency, though not 2 
strong one, for “Yes” answers to Item 72 to accompany high 
total scores. 
(2) The SZ of Bi-serial г 

Provided neither p nor q is very small (e.g., smaller than .05), 
an approximate formula for the standard error of bi-serial 7 19 


FURTHER METHODS OF CORRELATION 351 


(Pans 
= i bis 
ват VN (67) 
(SE of rvis for values of p and а greater than .02) 


The SE of the гыз of .27 found in Table 53 is .12, and the critical 
ratio is .27/.12 or 2.25. From Table 29 we find this bi-serial 7 
to be significant at the .05, but not at the .01 level.* 


TABLE 54 


Devrares (г/о) 1х TERMS OF c-Uxirs AND ORDINATES (2) FOR 
GIVEN Анвлв MEASURED FROM THE Мвах or A NORMAL 
DisrRisuTION WHose Тотль АввА = 1.00 


[x/o = x] 
Area from А 
the Mean x or (2/0) = rer m x or (2/0) * 
(а) (а) 
00 1000 .399 706 
01 .025 .399 139 
02 050 .398 772 
.03 1075 398 506 
01 100 .397 842 
05 126 .396 878 
06 151 304 -915 
07 1176 303 -954 
.08 .202 .391 :995 
.09 298 .389 1.036 
10 .253 .386 1.080 
1 .279 884 1.126 
12 .305 .381 1.175 
43 .332 1878 1.227 
14 358 374 1.282 
-15 .385 370 1.341 
16 412 .366 1.405 
47 1440 .362 1.476 
18 1468 .358 1.555 
19 .496 .353 1.645 
-20 594 348 1.751 
21 .553 .342 1.881 
322 583 337 2:054 
23 .613 .331 2.326 : 
24 .643 .324 eo 000 
25 :675 .318 


N a the .05 level tne CR — 1.98, at the .01 level 2.63, when the 
= = 99 


352 STATISTICS IN PSYCHOLOGY AND EDUCATION | 


(3) An Alternative Formula for Bi-serial r А 
There is another — and slightly different — formula for bi- 
serial г which is often useful. This is | 

M,- М. р 
с 2 


Тыв = 


(68) | 


(bi-serial coefficient of correlation or bi-serial г in terms of Мт, | 
the mean of the total group) 


in which 
M, = mean of the group in the first (or p category) E- 
M = mean of entire group 

o = standard deviation of entire group 

p — proportion of whole group in category one 

2 = height of ordinate in normal curve dividing p from q 


Substituting in formula (68) the values for M,, Mr, с, p, and 2, 
shown in Table 53, we have 


‚ _ 00.08 — 58.05. .600 i 
Тыз = — ERST -- -386 = 27 | 


which checks our previous result. 

Formula (68) is especiaily well-suited to those problems in 
which sub-groups having different characteristics are drawn 
from a larger group, the larger group mean (M т) remaining the 
same. 

Тһе bi-serial correlation method has f requently been used in 
determining item validity,* that is, in finding whether success or 
failure upon a given item is correlated with total score in the test 
or with score in some criterion (Table 53). If those who achieve 
high scores in the criterion get an item right more often than 
those who make low scores, the item will be positively correlated 
with the criterion. Such an item is a better measure of the 
eriterion than one which correlates zero or negatively with crj- 
terion scores. 

* Long, J. A., and Sandiford, Peter, The Validation of Test Items, De- 


partment of Educational Research, University of Toronto, Bulletin 7 
(1935), 16-17. 


FURTHER METHODS OF CORRELATION 353 


When items are scored 1 if correct and 0 if incorrect, the as- 
sumption of normality in the distribution of responses to any 
given item is not warranted.* Formula (69) below gives a bi- 
serial coefficient which does not assume continuity in the dis- 
tribution of single test items, and is recommended for use in 
item analysis: 

M, — M, т- 
Тыз = BE Ура (69) 
(bi-serial coefficient of correlation for use in item analysis) 


Formula (68) may be — and is generally — used in determining 
item validity, but (69) is somewhat more defensible mathe- 
matically, and is easier to apply. The validity-index of Item 72 
(Table 53) by formula (69) is .21. 


2. Tetrachoric Correlation 

We have seen in the last section that when one variable is 
Continuous and is expressed in the form of test scores, and the 
other is dichotomous or in a twofold classification, bi-serial r 
gives a measure of the relationship between the two variables. 
Ап extension of this problem to which bi-serial 7 is not applicable 
Presents itself when both variables are dichotomous. We then 
have a 2 x 2 or fourfold table, from which a modified form of 
the product-moment coefficient, called tetrachoric r, may be cal- 
culated. Tetrachoric r is useful when one wishes to find the 
relationship between two characters or attributes neither of 
Whieh is directly measurable, but both of which are capable of 

eing separated into two categories. Thus, if we wish to meas- 
ure the correlation between school attendance and employment, 
Persons might be classified into those who have attended high 
School and those who have not; and into those who are employed 
апа those who are unemployed. Or if we wish to discover the 
Correlation between intelligence and social maturity, children 
might be classified as “above average" and “below average" in 


: and Stalnaker, J. L., “А Note on the Use of 


Rich 1. W. 
x М,» » Journal of General Psychology, 8 (1933), 


rud r in Test Research," 


354 STATISTICS IN PSYCHOLOGY AND EDUCATION 


intelligence, on the one hand, and as socially mature and socially 


immature on the other. The tetrachoric correlation method = 


assumes that the two variables being studied аге essentially 
continuous, and would be normally distributed if it were possible 
to classify them more exactly into finer groupings. 


(1) Calculation of Tetrachoric r 


Table 55 illustrates a 2 x 2 fold table, and shows the steps 
involved in calculating tetrachoric 7. The problem is to find 
whether a larger number of successful than of unsuccessful sales- 
men tend to be “socially well adjusted.” The data are artificial. 
The X-variable (along the top of the diagram) is divided into 
two categories “successful” and “unsuccessful”; and the Y- 
variable (along the left of the diagram) is divided into tivo cate- 
gories “socially well adjusted” and “socially poorly adjusted.” 
The sums of the rows show that sixty salesmen (a+b) out of 
the sample of 100 are classed as well adjusted socially, and that 
forty salesmen (¢ + d) are classed as poorly adjusted socially.* 
The proportions in each category (p and q) are 60% and 40%; 
respectively. The sums of the columns show that fifty-five of 
the 100 salesmen are classified as unsuccessful, and forty-five 
ав successful; the proportions are 55% (q’) and 45% (р). On 
the assumption that "social adjustment” is distributed nor- 
mally, from the proportions p — .60, and q — .40, we obtain an 
x = — .253, and z = 386. These last two values are read from 
Table 54 as follows: Тһе perpendicular line (i.e., the ordinate, 2) 
separating the upper 60% from the lower 40% in a normal curve 
is just 10% from the mean. Hence, entering the first column of 
Table 54 with a = .10, we read x = — 253, and г = .386. See 
diagram on page 356. 

The x’ and 2’ values Corresponding to р’ = 45 and д’ = .55 are 
calculated in the same way. The perpendicular line dividing 


* To accord with the plan of the ordinary correlation table (p. 280), 
the categories in Table 55 have been so arranged that concentration 0! 
data in the first and third quadrants (a and d) denotes positive correlation: 
concentration of data in the second and fourth (b and c) quadrants negative 
correlation, 


4 


FURTHER METHODS OF CORRELATION 355 


TABLE 55 


To ILLUSTRATE THE CALCULATION OF TETRACHORIC r (г) 
(The data are hypothetical) 


X-variable 
100 Sal 
00 Salesmen Totals 
Unsuccessful | Successful 
2 | Socially Well 25 35 60 
Я Adjusted (b) (a) p = 60% 
# | Socially Poorly 30 10 40 
| — Adjusted (d) (c) q = 40% 
55 45 100 
"Totals q = 55% p' = 45% 

For p = .60, q = 40, а = 10 Рог р’ = 45,47 = .55, а = .05 
Желе = ii able 54 x’ = .126 Table ad 
z= 386 Г 56 2’ = .396 L Figure 56 

аа — be _ La XX 
КҮРТЕ 2 
1050 - 250 mp (— .253)(.126):* 
100°(.386) (.396) 2 
528 = r — .0167° 

or .016r? — r + .523 = 0* 

_ +1= VI — 4(.016)(.523) = his Vi-.054172 
n 2 X 016 052 

_ +1 .9831 

| 082 


-53 (taking numerator as + 1 — .9831) 
+ ^ (taking numerator as + 1 + .9831) 


ШІ 


* The eneral form of a quadratic equation is az? + bx + c = 0, The 
two values of x (i.e., the Tons of the equation) may be computed by the 
a 


formu 
— b+ vb? — 4ac 
= 


In the equation .01672—7--.523—0, а=.016; b— —1.00; and с=.593. 
Hence, 


| r= +1 + УТ 46016) (523) 
i 2 X 016 


.53 or 62 (an impossible value) 


356 STATISTICS IN PSYCHOLOGY AND EDUCATION 


2-605 


well 
adjusted 


poorly 
adjusted 


X =.—7.253 
Fic. 56. 


the upper 45% (the percent Successful) from the lower 55% (the 
percent unsuccessful) is 597 from the mean; and from Table 54, 
for а = .05, x’ = 126, and z’ = 396. See diagram below. 


Ап approximate formula for tetrachoric r may be written as 
follows: 


ut =, + m (692) 


(approximate formula for letrachoric г) * 
in which 


x and x’ = o-distances from the means to the points separating 
the proportion in the upper category from the pro- 
portion in the lower category; 


* Pearson, Karl, On the Correlation of Characters Not Quantitatively 
Measurable, Philosophical Transactions, Royal Society of London, Series 
А, 195 (1900), 1-47. 


FURTHER METHODS OF CORRELATION 357 


2 and z' — the heights of the ordinates at the points of division; 
а, b, c, d = entries in the four cells, see Table 55; 
М = number of cases; i.e., sum of entries in the four cells. 
7; — the tetrachorie coefficient of correlation. 


In Table 55, ad is found to equal 1050, and бс to equal 250. 
Substituting for these quantities, and for x, x’, z, 2’ and № in 
formula (69a), we obtain r, = .53. This coefficient indicates a 
fairly substantial correlation between success in salesmanship 
and social adjustment. In order to compute ғу it is necessary 
that we solve a quadratic equation. The method of carrying 
through this solution is given in Table 55 and in the footnote at 
the bottom of the table. Note that only the first of the two 
Solutions for 7; is a possible value, as the second is greater than 
unity, 

The investigator who finds it necessary to calculate many 
tetrachoric 276 may greatly shorten his work by using the com- 
puting diagrams devised by Thurstone and his co-workers.* 
These charts enable one to obtain a solution for r, by graphic 
methods as soon as the proportion within each of the four cells 
of the table is known. 


(2) The SZ of a Tetrachorie r 

The formula for oy, is an exceedingly complex expression and 
is not reproduced here. Тһе derivation of ør, will be found in 
books dealing with the mathematics of statistical theory. The 
Computation of or, can be greatly shortened by the use of 
Pearson's Tables XXIII and XXIV.t An approximation to the 
SE of a tetrachoric r тау be found in the following way: Тһе 
бт, is about 50% higher than the SZ of an equivalent product- 
Moment r, that is, a product-moment r equal to the given r, and 
calculated from a sample of the same size as that upon which 

* Й ) j 
the Танаа Correlation сой, University of Chicago Bodl 
a n С.С. and Van унын, Ws, db Statistical Procedures and 

Ce d. do co қай ана (йді, 
Introduction, xl-xli, and p. 35. 


358 STATISTICS IN PSYCHOLOGY AND EDUCATION 


т, is based. The SE of a product-moment r of .53 is .07, for Т 
N = 100; hence the SE of a tetrachoric 7 of .53 is approxi- 
mately .07 x 1.5 or .11. The obtained т, of .53 is nearly five 
times its SE and is, therefore, significant at the .01 level 
(Table 17). | 

Tetrachoric г is often used as a means of evaluating a test's 
efficiency in separating two contrasted or "criterion" groups. 
Ап example is given in Table 56 (the data are artificial). The 
problem is to find whether a test of deductive reasoning (here, à 


TABLE 56 


То ILLUSTRATE THE Use or TETRACHORIC г IN EVALUATING 


TEST 


X-variable 


College Juniors 


Non-Science Science 
2 Majors Majors 
"2 | Above Test. 24% 35% р = 59% 
Е Mean (b) (a) 
> | Below Test 29% 12% 4 = 41% 
Mean (d) (c) 
4 = 58% | р = 479 100% 
For p = .59, у = .41 For p' = 47,4 = .53 
x = — .228 x = 075 
2 = .389 z’ = .398 
.1015 — .0288 _ (— .228)(.075)72 " 
(380)С398) 77% 2 ia! 
470 = т — .0097? 
ör 009% — r + .470 = 0 
= +1 V1 - 4(.009)(.470) 
2(.009) 
_ +1+ .9915 
га .018 


„АТ, or 111 (an impossible value) 


syllogism test) will differentiate fifty-nine college juniors major- 
ing in science from sixty-six college juniors majoring in literature 


FURTHER METHODS OF CORRELATION 359 


or languages (non-science). The X-variable is divided into 
scienee majors and non-science majors; the Y-variable into 
those above and those below the mean of the test, i.e., the mean 
score established by the entire junior class. The entries in the 
cells, a, b, c, and d, are expressed in percents, 80 that N? in 
formula (69a) is 1.00. As shown in Table 56, the correlation be- 


tween majoring in science and high scores on the syllogism test 
« a number of tests with a view 


is 47. If one were investigating 

toward determining their relative values as indicators of scien- 
tific aptitude, the worth of each test could be measured in ac- 
cordance with its ability to separate the two criterion groups.* 


3. The Contingency Coefficient 

Тһе coefficient of contingency, С, is often used to determine 
relationship when the variables under study can be put into more 
than two classes or categories. The contingency coefficient may 
be derived directly from Ж (p. 241); but it differs from x? in 
that it provides a measure of correlation. which under certain 
conditions (p. 303) is comparable to the product-moment T. 


C bears the following relation to “гі 


Ий а 70 
C-V yr o 


(a formula for C, the contingency coeficient, in terms of xè) 
(70), we find that the C 


Calculating x? and applying formula 
of rr 31.43 
example 6 (1) (р. 376) is V 513 + 31.43 
small degree of relationship be- 


d education of husbands. То 
cative of a significant rela- 
dard error. Unfortunately, 
and is somewhat laborious 


ог .24. Taken at 


face value, this C indicates à 
tween marriage-adjustment an 
find whether the obtained C is indi 
tionship, we should calculate its stan 
the SE of C is a complex expression 1 


* + "1 " т sj 
For a discussi » application of tetrachorie 7 to problems in- 
volving two А or extreme groups in which the middle group 
ìs eliminated, see Peters, С. С.» and Van Voorhis, W. В., Statistical Pro- 
cedures and Their Mathematical Bases (1940), pp. 375-384. 

ў See Kelley, T. L., Statistical Method (1923), Р. 269. 


360 STATISTICS IN PSYCHOLOGY AND EDUCATION 


{о compute. Гог a C = .00, however, ос = ы and this 
formula may be employed to give a rough test of the significance 
of an obtained C. On the null hypothesis the relationship be- 
tween marriage-adjustment and education x :00, and its SÆ is 

e ог .044. Our calculated C of .24 is Gu or nearly 65 Ес 

EXE 4 
ou from a C of .00. Hence, C = .24 may be considered to 
indicate a. small but highly significant degree of correlation be- 
tween marriage-adjustment and education of husbands. 

When one is not directly interested in x? itself, it is possible 
io compute C directly rather than by way of х?. There are two 
methods of calculating C which will be given in order. 

(1) Method A of Calculating C 

Table 57 illustrates the computation of C from a 4 x 4 fold 
contingency table. Тһе table gives the classification of 1000 
fathers and sons with respect to eye color, The independence 
values for each cell have been computed as shown in Table 40 
(p. 252). To repeat the method of calculation, 335/1000 of all 
sons are described as blue-eyed. This proportion of 358 (1.е., 


54%) gives 120 as the number of fathers who might be 


1000 
expected to have blue-eyed sons 
with the 194 fathers who actuall 
When the independence values h 
obtained cell entry, and divide by its own independence value 
as shown in Table 57. The sum of these quotients gives S; and 
from S and N, C is calculated by the formula 


c-y XN (71) 


(formula for C, coefficient of contingency, calculated directly) 


In Table 57, C is 46. On the null hypothesis when С = .00, 


"by chance," as contrasted 
у did have blue-eyed sons. 
ave been found, we square each 


1 
UT i i T сс 
сс = 71000 OF 03. Тһе obtained C of .46 is fifteen times 


FURTHER METHODS OF CORRELATION 361 


TABLE 57 


To ILLUSTRATE THE CALCULATION OF C, THE COEFFICIENT OF 
Сохтіхсвхсү BY METHOD A 


Father's Eye Color П. Calculation of C 


Blue Gray Hazel Brown Totals (194)* 
: 190" = 313.6 
| Blue | (120) | (ss) | (60) | (66) . 
! 5 194 | 70 41 30 335 ($3)? 22, 
5 102 2 
За (102) | (75) | (51) | (56) $ 
g e 83 | 124 | 41 | 36 | 284 ia - 198 
- ES = 
— I (49) | (36) | (25) | (27) А 
oho | OP) | G9 | GD ш eor = 360 
A 
| 8 64 44) | (48) (70): 9 
"Brown C | GP | d$ | 000 | ж e = 557 
| Totals 358 264 180 108 1000 АЕ” 
m ses 
I. Independence Values eu = 321 
335 x 358 137 x 358 (36) 
Tan = 120 ——— = 49 GO 
1000 1000 Ыы 20.8 
335 x 264 137 Х 264 (41)? 
= = “= 28.0 
2 лой = 88 mo T m 
41) 
335 SD s 
$85 X 180 о 137 X180 _ „5 зі 33.0 
1000 771000 | (ББ), 
335 x 198 137 x 198 Se = 1210 
x = 27 5] 
771000 — =: 68 — 1000 | (43): _ às 
284 x 358 244 x 358 ug co ES 
"dog = 102 Tm abel QU. ав 
284 x 964 244 х 264 
AS SS = вд 36) 
Туу = 75 1000 9 80 = 231 
284 x 180 244 X 180 (23)2 
244 X 180 _ 44 nes 
Ae c 1000 27 199 
2 (109) _ 
S = 12708 
М = 1000 
8-М- 2708 


362 STATISTICS IN PSYCHOLOGY AND EDUCATION 


and hence is highly Significant of a fairly strong correlation be- 

n r in father and son. 
ы rh ndm plus or minus, the sign to be affixed depending 
upon an inspection of the contingency table itself. In T'able 51 
it is evident that pigmentation of eyes in father and son is pos- 
itively correlated * and hence that C is positive. 

A disadvantage of the contingency coefficient is the fact that 
C does not remain constant for the same data when the number 
of classes varies. The C calculated from a 3 x 3 fold table will 
not ordinarily equal the C calculated from the same data ar- 
ranged in, say, a 5 х 5 fold table. Moreover, the maximum 
value which C can take will depend upon the fineness of the 
classification employed. It can be shown { that 


when the number of classes = 2, C cannot exceed .707 
when the number of classes = 3, C cannot exceed .816 
when the number of classes = 4, C cannot exceed .866 
when the number of classes = 5, C cannot exceed .894 


when the number of ¢ 5 = 6, C cannot exceed .913 
when the number of classes = 7, C cannot exceed .926 
when the number of classes = 8, C cannot exceed .935 
when the number of classes = 9, C cannot exceed .943 
when the number of classes — 10, C cannot exceed .949 


In the light of this table, Yule suggests that we “restrict the 
use of the ‘coefficient of contingency’ to 5 x 5 fold or finer 
classifications" in order that the maximum value of C may be 
as near unity as possible. At the same time, we should avoid а 
too-fine classification or C will be affected by slight or “casual 
irregularities of no physical significance”; and, in addition, the 
arithmetic of calculation will be greatly (and needlessly) in- 


* We note, for example, that 194 blue-eyed fathers have blue-eyed 
sons, while only 30 brown-eyed fathers have blue-eyed sons. Moreover, 
109 brown-eyed fathers have brown-eyed sons while only 56 blue-eyec 
fathers have brown-eyed sons. Comparisons of this sort will show th 
association between pigmentation in the eyes of father and son is posi 
tive. 

1 Yule, G. U., and Kendall, M. G., An Introduction to the Theory of 
Statistics (12 ed., 1940), p. 69. 


FURTHER METHODS OF CORRELATION 363 


creased. Pearson * has worked out a correction for “broad cate- 


` gories" which should be applied to C's calculated from 4 X 4 


fold or broader groupings if C is to be compared with r. For 
5 х 5 fold or finer classifications, this correction is so small that 
for practical purposes it may be disregarded. 

Since the classification in Table 57 is 4 X 4 fold, the value of C 
will be increased if corrected for broad categories. An approxi- 
mate correction, which is easier to apply than Pearson's correc- 
tion, can be made by dividing the obtained C by the maximum 
value which C can take in a 4 X 4 fold contingency table. Іп 
the present problem, dividing our C of .46 by .866 (the maxi- 
mum C for a 4 x 4 fold table) we obtain a “corrected С” of .53. 
This value may be taken as approximately equal to ғ; it indi- 
cates a fairly high correlation between pigmentation of eyes in 
father and son. 

The relation of C to r is, under certain conditions, very close. 
C is substantially equivalent to (1) when the grouping is rela- 
tively fine — 5 x 5 fold or finer; (2) when the sample is large; 
(3) when the two variables may legitimately be classified into 
Categories; and (4) when we are justified in assuming that the 
variables under investigation are normally distributed. 


(2) Method B for Calculating С 

'The arithmetie involved in computing C may be lessened 
Somewhat by combining the twofold process of (1) calculating 
independence values and (2) dividing the square of each cell 


frequeney by its independence value. This method is illustrated 
in Table 58. ‘The first occupied cell in the first column of the 
9x8 


9 
table has a frequency of 1 and an independence value of Ec x 
henee the cell frequency squared and divided by the independ- 
СЕЗД, ia the 


€nce value is 12384. This fraction, namely, 8x99 


8 x 99 
* <. s f the Influences of 'Broad 
Pears Karl, “Оп the Measurement о е } 
Categories! on Corellon? Biometrika, 9 (1913), 130; also see the discus- 
Sion in Peters, C. C., and Van Voorhis, W. В., Statistical Procedures and 


Their Mathematical Bases (1940), pP- 391-393. 


364 STATISTICS ІХ PSYCHOLOGY AND EDUCATION 


TABLE 58 


То ILLUSTRATE THE CALCULATION OF C ву Метнор В 
Boys: Aces 4-5 YEARS 


Weight in Pounds 
24-98 29-33 31-38 39—43 44-48 49-53 Тош 


45-47 1 2 3 
4 5 2 5 5 
Е 42-44 1 35 21 5 65 
X 39-41 5 87 90 7 1 190 
a 
З 86-38) 1 as: |! 2 8 99 
$ | 
S 33-35 | 5 15 5 25 
30-32 | 2 2. 
8 ы 30 6 — 384 
Column 1: ilz: 5+ 4]. = .3762 
я p^ 225 
Column 2: 3s m 99 == 55 = .8264 


4, 7569 | 5181 Р 
Column 3: late as + Gop +5184 + zl- 5549 


1 [1225 , 8100 , 64 
133| 65 + ы + 96 = 4071 


4 EI 
5: 
Column 5: EM um 


Column 4: 


Column 6: = 125 T 4] = .0650 


- (251 = 1 = fe 0658 _ 
2.0688 ^ 
contribution of this particular cell to the total S. In the same 


way, the contribution to S of the next cell in this column is 


2 2 4 
found to be хз, апа of the third and last cell, Рив. 


8x2 
These contributions from column 1 may be combined to give 
E S s + E 3 >). The contribution of each of the other five 


= 


FURTHER METHODS OF CORRELATION 365 


columns to 5 may be found in like manner. Moreover, since 
N (i.e, 384) is a common factor in each column, it may be left 
out of the computations entirely, in calculating the contribution 
of each cell, as shown in Table 58. Then if the sum of all six 
Columns is denoted by P, 


Pl я 
C= жерг” (72) 


(alternate method of calculating С) 


In Table 58, C equals .72 and the coefficient of correlation, г, 
from the same table is .71 (see p. 305). Тһе correspondence of 
Сапа, here is very close, closer perhaps than that generally 
to be expected, although the difference between the two coeffi- 
Cents is never very great when the conditions prescribed on 
Page 363 are met. In the present case, № is large, the classifica- 
tion is 6 x 6 fold, and the distributions are fairly normal. 


ПІ. CURVILINEAR ов NON-LINEAR RELATIONSHIP 


l. The Correlation Ratio 
The relationship between the paired values of t 
measures, X and Y, may be described in a general way as 


linear” or “non-linear.” When the means of the arrays of the 
n table follow straight 


Successive columns and rows in a correlatio 
ines (at least approximately), the regression is said to be linear 
Ог Straight-line (p. 281). When the drift or trend of the means 
of the arrays (columns or rows) cannot be well described by a 
Straight line, but can be represented by a curve of some kind, 
© regression is said to be curvilinear or in general non-linear. 
, Yur discussion in Chapter IX was concerned entirely with 
Mear relationship, the extent or degree of which is measured by 
he product-moment coefficient of correlation, 7. It sometimes 
2ppens in mental measurement, however, that the relationship 
2etween two variables is definitely non-linear; and when this 
18 true, r is not an adequate measure of the degree of correspond- 
ence or correlation. When the regression js non-linear, a curve 


wo sets of 


366 STATISTICS IN PSYCHOLOGY AND EDUCATION 


joining the means of successive arrays (in the columns, say) 
will fit these mean values more exactly than will a straight line. 4 
Hence, should a truly curvilinear relationship be described by 
a straight line, the scatter or spread of the paired values about 
the regression line will be greater than the scatter about the 
better-fitting regression curve. The smaller the spread of the 
paired scores about the regression line or the regression curve 
which relates the variables X and Y (or Y and X), the higher 
the relationship between the two variables. For this reason, an 
т calculated from a correlation table in which the regression is 
curvilinear will always be less than the true relationship. Ап 
example will make this situation clearer. The correlation be- 
tween the following two short series, as given by the product- 


Variable X Variable Y 
1 25 
z 2 .50 
3 1.00 
4 2.00 
5 4.00 


moment formula, is r = .9 [formula (46), p. 289]. The true 
correlation between the two series, however, is clearly perfect, 
since changes in Y are directly related to changes in X. Аз С 
increases by 1 (i.e., іп arithmetic progression) Y doubles (i.e 
increases in geometric progression). Тһе reason why т is less 
than 1.00 becomes obvious as soon as we plot the paired X and Y 
values. As shown in Figure 58, the relationship between X and 
Y is curvilinear, and is exactly described by a curve which 
passes through the successively plotted points. When linear 
relationship is forced upon these data, the plotted points do not 
fall along the straight line, and the product-moment coefficient, 
r, is less than 1.00. However, the correlation-ratio, or coefficient 
of non-linear relationship 7 (read as eta) for the given data 1$ 
1.00. 

Eta measures the concentration of paired Х- and Y-values 
about a relation curve just as r measures the concentration of 


FURTHER METHODS OF CORRELATION 367 


Y- variable 
N 


I 2 3 4 5 
X-variable 


Fic. 58. To Illustrate Non-Linear Relationship. 


Paired Values about a relation line. Eta is a more general co- 
suelen than » as it is applicable when regression is linear as 
as When it is non-linear. If the regression is linear and the 
cans fall along straight lines, 7 will equal r. But if regression 
hem linear and the means lie along a curve, 7 will be greater 
Vala” The coefficient of correlation, therefore, is " inn 
relati of Ше more general coefficient, т, just as straig it- ine 
= ‘Onship is a limiting case of curvilinear relationship. There 

° always two 7’s in every non-linear correlation table, just as 

A ere are always two regression coefficients, 7 = and r A ina 
че xim Which regression is linear. _The fist cate i ion aidie, 
Заңды, Jus) gives the regression‘of Y on X a is the е E 
re к, * The second correlation-ratio, written Tas a es the 
With | 10n of X on У (X is the dependent УЛАШ) а 
in y F two regression equations (p. 310) in a correlation table 

ch relationship is linear.) а. 
ік. е Correlation-ratio is always given the positive sign, and 
telati, e lies between :00 and 1.00. Whether the йон of 
Rust be Ship given by 7 is positive, negative, or а var ying one, 
*termined by inspection of the correlation diagram. 


368 STATISTICS IN PSYCHOLOGY AND EDUCATION 


о. The Calculation of 7 in a Correlation Table 


One of the most useful methods of calculating the two qs 4 


(ху and Nyz) in а correlation table in which the relationship 18 
known (or suspected) to be non-linear is illustrated in Figure 
59.* Ordinarily, one will wish to compare the two calculated 
n’s with the r obtained from the same data in order to deter- 
mine whether regression is, or is not, significantly non-linear. 
For this reason, the computation of r is included in Figure 59 
as part of the process of calculating the 7’s. The steps to be 
followed in finding n,- may be outlined briefly. The method of 
calculating Nz, shown on the right of the diagram, follows 
exactly the method outlined here for the calculation of yz and 
hence will not be repeated. 


Slep 1 

Construct a correlation table as shown in Figure 59, and 
described on page 276. Calculate су and ce; using the Assumed 
Mean method (p. 41). 
Step 2 

Determine the entries in the Ху’ row. These entries are f ound, 
as described on page 286, by multiplying the frequency 1n each 
column by its deviation (1.е., its у’) measured in units of class- 
interval from the Assumed Mean of the Y-distribution. T° 
illustrate, in column one, reading down, we have (1 X — 2) 
+ (1 X — 4) + (4 X - 5) + (2 X — 6) or—38. For column twO, 
the Ху’ entry is (1 X 2) + (2x — 1) + (2 X — 2) + (2 X — 3) 
+ (1x — 4) or — 14. Square each (Zy’) entry to give the (Eyy 
row. Then divide each entry in the (Zy^)? row by its corre 

(Zy’)? 


sponding f- to give the row —~—- In column one, for example, 


divide 1444 by 8 to obtain 180.50; and in column two, divide 


* For further discussion of the method here outlined, see Dvorak, ds 
* A Simplified Computation of Non-linear Correlation," Jour Educa 
tional Research, 25 (1932), 99-104. elation,” Journal of 

Holzinger, К. J., “А Combination Form for Calculating the Correlation 
Coefficient and Ratios," Journal of the American Statistical Association: 
18 (1923), 623-627. 


369 


FURTHER METHODS OF CORRELATION 


(3593 3uouroAorqovw uv uo 9109$ рив o3v uooAjoq 901989110) 


(079) Ш Jo uorye[noqv?) әчү рив uoi 


gre  HOIXeU ud 


5591891 1vourT-uoN әуелеп OL '60 ‘DIY 


ev = LL’ 
326`Т 2072 


. __001 A="2y, .... 001 
80° 7 0679/7 OW - €6 [82 


Ш 


et= 19-  %-1 о 9- 9 % got Aar 
86 182-- 087,9 2/79 90° 688 1/72 69° 00%2 06081 ШУАҚ fix) 


686 ТОТ Т 691 98 6 961 PPPI WC Az)’ 
Bl Ti f SI 9 $— ТІ %- Az 


^f II-9I II-GI II-PI ТІРІ II-GI II-II 11-01 11-6 
0291 O29] 03 ұг 07 ді 0221 0711 030г 026 


CN 


370 STATISTICS IN PSYCHOLOGY AND EDUCATION 


(Ey'y 


196 by 8 to obtain 24.50. The total of the —~— row in Figure , 


Б 
59 is 281.93. 


Step 3 
(y Y a K . Р 
From -22——, c, М, апа c,, calculate 7, by the following 


ы. ы 
E Nl. қ 


Ту = — (7 


formula*: 


3) 


(correlation-ratio, т, a measure of non-linear relationship in 
terms of the standard deviation of the means of the Y-arrays) 
" Dy’)? 

In Figure 59 Фу» = 281.93; М-100; с, = 40; and 
с, = 2.02 (in units of interval). Substituting these values in 
formula (73) we obtain .77 as the value of 7,.. 

The formula for 7-), the second eta in a correlation table, 18 


74) 

Cz ( 

(correlation-ratio, "гу а measure of non-linear relationship т 
terms of the standard deviation of the means of the X-arrays) 


In the present problem, 72, = .42 (see Fig. 59 for calculations): 

In most correlation tables, as illustrated here, the two 15 
will differ in size, since their values depend upon the scatter 
about the curve joining the means of the Y-arrays, and the 
scatter about the curve joining the means of the X-arrays. In 
any particular problem, also, one correlation-ratio will ordi- 
narily be of greater interest than the other; just as in linear C077 

ж There are several alternate formulas, equivalent to formula (73), 
which may be used in calculating y. See Peters, C. C., and Van Voorhis; 
W. R., Statistical Procedures and Their Mathematical Bases (1940), рр. 312- 


330; also Yule, С. U., and Kendall, M. G., An Introduction to the Theory 
of Statistics (12 ed., 1940), pp. 242-246. 


FURTHER METHODS OF CORRELATION 371 


relation, one regression equation is usually of greater interest 
than the other (p. 318). In Figure 59, тш is obviously more 
valuable than Nzy, since it gives the change in score (У) resulting 
from changes in age (X); Y is the dependent variable, and X is 
the independent variable (р. 314). Тһе curve which describes 
the relation between age and score — the curve through the 
Means of the Y-columns — has been sketched in on the correla- 
Hon diagram. Note that this curve begins and ends low, reach- 
mg its peak in the middle of the age range. Both younger and 
older children in the grade make low scores, the highest scores 
being achieved by children in the middle of the age range. A 
Probable reason for the obtained non-linear relationship between 
*8e and score is that the given test contains elements unfamiliar 
to, or inadequately learned by, the younger children, and items 
too difficult for the older (and probably duller) children. The 
2656 scores, therefore, are achieved by those in the middle of 
16 age range. The product-moment r in Figure 59 is .26. 


3. The Standard Error of 7 
The gg of a correlation-ratio may be caleulated by the 
formula 
4, Ku (75) 
VN-1 
(standard error of а correlation-ratio, т) 
258 SE of the Ney of .42 is .08, and of the 7у- of .77 is .04. Both 
these coefficients are clearly significant (see p. 297). 


SE, 


* The Correction of an Obtained 7 

he size of an obtained 7 depends directly upon the fineness 
of Stouping in the X- and Y-variables, as well as upon the size 
in x hen М is comparatively small and the number of arrays 
» or Vig large, a correction* should be applied to the obtained 
„ 16 formula for a corrected 77 is 


Ratio sarson, Karl, “On the Correction Necessary for the Correlation- 
ise tometrika, 14 (1923), 41 N M 
end also, eters, C, AA Man Voorhis, ұғы Statistical Procedures 
*" Mathematical Bases (1940), pp. 312-825. 


372 STATISTICS IN PSYCHOLOGY AND EDUCATION 
2 (x 3) 
obt — N 
t= (к-3) 
N 
(correction of 1 for fineness of grouping) 


Corrected 7 = 


in which к equals the number of arrays in X or У. To illustrate, 
if we apply this correction to 7, obtained from Figure 59, we 
have, upon substituting .77 for ту, 8 for the number of Y- 
arrays (i.e., columns), апа 100 for М, 


Зоной sus e x Т — ЛЕ 
orrected ny. =т= 


= .76 
The correction here is small, since nys is large, and the number 
of Y-arrays moderate. The correction which must be applied 
to Ne is larger. Thus, substituting .42 for Ney, and 10 for the 
number of X-arrays (i.e. rows), we have 


/ C42)? — .07 
1,- 07 


= .34 


Corrected 7, 


Ш 


When 7 is small, and the grouping fine (ї.е., data classified into 
many intervals) the correction given by formula (76) may be 
considerable, and hence should be made. 


5. Test for Linearity of Regression 

It is not always easy to tell from the appearance of a corre- 
lation table whether regression is linear or non-linear. In Figure 
59, from the curve joining the means of the columns, it seems 
clear that the regression of Y on X , at least, is non-linear. Fur- 
ther evidence of non-linearity is offered by the fact that the co- 
efficient of correlation calculated from Figure 59 is .26, very 
much smaller than the 7,2 of .77. As stated on page 367, when 
regression is strictly linear 7 = ғ; and the greater the departure 
of the regression from linearity, in general the greater the dis- 
erepancy between 7 and т. 


FURTHER METHODS OF CORRELATION 373 


A test for non-linearity of regression in terms of x? enables us 
lo estimate the significance of the departure of curvilinear re- 
lationship from linear relationship. The formula for x? is 


xt-(N - к) (E = =) (77) 


1-2 
(x?-test for linearity of regression) 
in which № is size of sample and к is number of columns or rows. 
In Figure 59, т = .77, "= .26, and к-8 (number of 
Columns or Y-arrays). Hence from (77) 


x? = (100 — 8) (3: 3o) 
= 116.8 
Entering Table 32 with x — 2 or 6 degrees of freedom, we ob- 
tain а P which is much smaller than .01. Тһе probability is 
Quite remote, therefore, that a7deviation of 7yz from г as large 
as that obtained (i.e., 116.8) could have arisen from sampling 
2¢cidents. Hence we must abandon the hypothesis of linear 
Te tionship and accept the regression as curvilinear. 
| In Table 59, the second cla, Nz, is 42 and к= 10| (number of 
rows or X -arrays). From (77) we find 
-{.18 — .07 
x? = (100 у = -07) 
=11.7 
nd entering Table 32 with x — 2 = 8, we get a P which lies 
between .30 and .20. The x? of 11.7, therefore, is not significant 
e the regression of X on Y is probably rectilinear — or at 
ast not markedly non-linear. 
Summary on 7 and т. : 
pue non-linear relationship is often encountered in psycho- 
= iiia ànd in experiments dealing with fatigue, practice, for- 
the ing, and learning. Whenever an experiment is carried on to 
Point of diminishing returns, relationship will be curvilinear. 
ie Mental and educational tests, when administered to large 
ааа es, exhibit linear or approximately linear relationships; 
for this reason, г has been employed in psychology and 


374 STATISTICS IN PSYCHOLOGY AND EDUCATION 


education to a far greater extent than has 7. If regression is 
significantly non-linear it makes considerable difference whether 
т or т is the measure of relation. But if the correlation is low 
and the regression not significantly curvilinear, 7 will give about 
as adequate a measure of relationship as 7. 

The coefficient of correlation has the advantage over 7 in that 
knowing r we can write down at once the straight-line regression 
equation connecting X and Y or Y and X. This is not possible 
with the correlation ratio. In order to estimate one variable 
from another (say, Y from X) when regression is non-linear, à 
curve must be fitted to the means of the Y-columns. The equa- 
tion of this curve then serves as a “regression equation” from 
which estimates can be made.* 


PROBLEMS 


1. Compute the correlation between the following two series of test 
scores by the rank-difference method and test its significance. 


Cancellation Score 


+ Intelligence Score A-Test + Number 
Individual (an iy Alpha) ( "riesci етм 
ing Test) 

1 185 110 
2 203 98 
3 188 118 
4 195 104 
5 176 112 
6 174 a 124 
7 158 119 
8 197 95 
9 176 94 
10 138 97 
11 126 110 
12 160 94 
13 151 126 
14 185 120 
15 185 118 


[Note: The cancellation scores are in seconds; hence the two smallest 
scores numerically (i.e., 94) are highest and are ranked 1.5 each. 1 
* Snedecor, G. W., Statistical Methods (1940), Chapter 14. 


"o qr 


FURTHER METHODS OF CORRELATION 375 


2. Check the product-moment correlations obtained in problems 6 
and 7, pages 306-307, Chap. IX, by the rank-difference method. 

3. The following data give the distributions of seores on the Thorn- 
dike Intelligence Examination made by entering college freshmen 
Who presented 12 or more recommended units, and entering fresh- 
men Who presented less than 12 recommended units. Compute bi- 
Serial r by formula (66) and test its significance. 


12 or more Less than 12 
Thorndike Scores recommended recommended 
units units 

90-99 6 0 
80-89 19 3 
70-79 31 5 
60-60 58 17 
50-59 40 30 
40-49 18 14 
30-39 9 7 
20-29 5 4 
186 80 


4. = following data give the distributions of scores on Army Alpha 
1 © : by those who answered 50% or more, and those who answered 
ess than 50% of the items in test 2 (“Arithmetic”) correctly, 
9mpute bi-serial r and test its significance. 
Subjects answering 
less than 50% of the 
items on test 2 


dis Subjects answering 
тау Alpha 50%, or more of the 
Scores items on test 2 


correctly correctly 

185-194 7 0 
175-184 16 0 
165-174 10 6 
155-164 35 15 
145-154 Bi 40 
135-144 15 26 
125-134 10 13 
115-124 м Б 
105-114 0 25 

110 


376 STATISTICS IN PSYCHOLOGY AND EDUCATION 


5. Compute the tetrachorie т?з for the following tables which show the 
(1) Relation of ale :oholism. and health in 811 fathers and sons. '/ 
Entries are expressed as proportions. 

(2) Correspondence of Yes and No answers to two items of a neu- 


rotic inventory. 


(1) Sons 
Unhealthy Healthy Totals 
8 Non-Alcoholic ЕТІ 405 
Е Alcoholic .102 | .151 
Totals 445 .556 
(2) Question 1 
Еа Хо Yo Totals 
Е Yes 83 187 270 
E No | 102 | 93 | 19 
C Totals 185 280 405 


6. Calculate the coefficient of contingency, C, for each of the three 
tables given below. 


(1) Marriage-Adjustment Score of Husbands 
Very Low Low High Very High Totals 
im Graduate work 4 9 38 54 | 105 
5 8 College 20 31 55 99 | 205 
5 High School 23 37 41 51 152 
S Grade School 11 1011 19 | 51 
"Totals 58 87 145 223 513 
(2) Kind of Music Preferred 
English French German Italian Spanish Totals 
vz English | 32 16 75 47 30 | 200 
£g French 10 07 42 4l 40 200 
2.2 German | 12 23 107 30 | 22 200 
2 Italian 16 20 44 76 44 200 
^4 gpanish | 8 53 30 43 66 200 


Бы 
Totals 78 179 298 243 202 1000 


FURTHER METHODS OF CORRELATION 


(3) 


Post Graduate 
Work 
College Grad- 
uate 
Business Col- 
lege 
S High School 
Junior High 
Zlementary 
School 
Totals 


Education 


0- . 901- 
900 1200 
1 

2 10 
т Ф 
19 48 
28 101 


Salary 
1201- 2001- 
2000 4000 

1 30 
15 6 
30 % 
27 3 

4 1 
"T 51 


4001— 


377 


10,000 10,001- Totals 


ol 


7. The following table shows the relationship between scores upon the 
Thorndike Intelligence Examination and certain extra-curricular 
Activities of 102 Columbia College students. 
(a) Compute ту; and 7), and the SZ's. 
(6) Vind corrected values for both 7’s. : 
(c) Test both 278 for linearity of regression. 


Thorndike Scores (X) 


-T ол Б = 
[55 | co- | 65- |70- | 75- |80- | 85- | 90- | 95- 100- 
5 59 | 64 | 69 | 74 | 79 | S£ | 89 | 94 | 99 
Bee. d M | 
Rl 18-20 2| 2 4 
a И H 2 в a 
“| 15-17 2 3j 3 $ 
MI r— 
8| 12-14 4 6| 2 2 14 
5 наа — Ei = 
Е 9-11 1 | 2 4] 4] 6| 7] 8 27 
ы p: 
S| 6&8 |1 61212161] 2] 4] 1] 24 
a - 
$ LES 1 1 3 8 3 5 1 1 20 
ШІ 0-9 1 1 1 1 1 2 7 
= 6 15 11 а 102 
ы. 2 2 3 16 13 2 16 15 1 2 


378 STATISTICS IN PSYCHOLOGY AND EDUCATION 


8. In the following table (a) calculate the two 7’s and (b) test for 


linearity of regression. 


Age in Months (X) 


1. р = .19. Not significant (Table 49) 
3. Tris = .34. SE. = 07; very significant. P < .01 


4. тыз = .47. SÉ, = .07; very significant. P < 01 
5. (1) = — 09 


(2) п = 33 


80- | 90- | 100- | 110- | 120- | 130- | 140- | 150- n 

S9 | 99 | 109 | 119 | 129 | 139 | 149 | 159 
75-79 Г 10 | 10 
70-74 ad Е 12 12 
65-69 18 18 
60-64 8 | 16 24 
55-59 Е 10 8 18 
50-54 БЕН 7 12 12 

s E 

? 45-49 14 14 

E| 40-44 6 6 

z 35-39 8| 6 ЖО 

© — ! 

2 30-34 19| 7 аса 
25-29 2 2] 2] 5 31 
20-24 1 | 10 | 17 | 26 54 
15-19 2 4 8 | 15 | 12 4l 
10-14 | 5 5 | 12 8 | 24 9 63 
5-9 9 8 | 16 | 16 9 9 67 
0-4 6 6 3 | 20 | 18 7 55 
К 20 | 21 | 36 | 64 | 80 | 112 | 68 | 64 | 465 J 

ANSWERS 


— 


BT 


.—— T 


FURTHER METHODS OF CORRELATION 379 


6. (0) с 
(2) c= 
(3) C = .70 
7. (a) Ту: = 43, SE, = .08 (b) nyz (corrected) = .35 
Nzy = .20, SE, = .10 Nzy (corrected) = .00 
©) т = —.09. For mye, x? by (77) is 19.96. Р < 01; departure 


from linearity significant. For zy, x? = 3.14. P lies between 
70 and .50; departure from linearity not significant. 
8. (a) Тиг = .93, SE = .007 т, (corrected) = .93 
Ту = .82, SE = .016 т, (corrected) = .81 
(0) r = 78. For Mur, X? = 849.1. P < .01; departure from line- 
arity very significant. For 72), x? = 81.72. Р<.01 ; departure 
from linearity very significant. 


CHAPTER XII 


THE RELIABILITY AND VALIDITY OF 
TEST SCORES 


I. Tae RELIABILITY or TEST Scores 


Tue reliability of а test, as of any measuring instrument, de- 
pends upon the consistency with which it gauges the abilities of 
those to whom it has been applied. When a test is reliable, 
scores made by the members of а group — upon retest with the 
same test or with alternate forms of the same test — will differ 
very little or not at all from their original values. А reliable test 
is relatively free of chance errors of measurement, and scores 
earned on it are stable and trustworthy. If a subject scores $4, 
say, on a reliable test, we feel confident that this score represents 
very closely his true ability. Scores made on an unreliable test, 
on the other hand, are subject to large errors of measurement and 
are neither stable nor trustworthy. When a test is unreliable, 
subsequent testings will reveal many diserepancies between 
scores achieved by the same persons on different occasions. 


1. Methods of Determining Test Reliability 

There are three procedures in common use for determining 
the reliability (sometimes called the self-correlation) of a test. 
"These are (1) the test-retest (repetition) method ; (2) the al- 
ternate or parallel forms method; and (3) the split-half method. 
In addition to these three, а, fourth method — the method of 
"rational equivalence" — is also being widely used. All of 
these procedures furnish "estimates" of the reliability of test 
scores; sometimes one method and sometimes another will give 
the best estimate. 

380 


THE RELIABILITY AND VALIDITY OF TEST SCORES 381 


(1) Test-Retest (Repetition) Method 

Repetition of a test is the simplest method of determining 
reliability: the test is given and then repeated on the same 
group and the correlation is calculated between the first and 
Second sets of scores. While the test-retest method is sometimes 
the only feasible procedure, it is open to various objections. If 
the test is repeated immediately, many subjects will recall their 
first answers and spend their time on new material, thus in- 
Creasing their scores. Besides the memory effect, practice and 
the confidence induced by familiarity with the material will 
almost certainly affect scores when one takes a test for the 
Second time. ‘Transfer effects are likely to be different from 
Person to person. If the net effect of transfer is to make for 
Closer agreement between scores achieved on the first and second 
Elving of a test than would otherwise be the case, the reliability 
Coefficient will be too high. When a sufficient time interval 
145 elapsed between the first and second administrations of the 
lest to offset (in part, at least) memory, practice, and other 
effects, the reliability coefficient will be a closer estimate of the 
actual consistency of test scores. If the interval between tests 
18 long, however (say, six months or so), and the subjects are 
children, growth or maturity changes will affect the retest. 
. the test-retest method will estimate less accurately the re- 
liability of tests which contain novel features and which are 
ughly susceptible to practice than it will the reliability of 
tests involving routine operations little affected by practice. 
. cause of the difficulty in controlling the conditions which 
nfluence scores оп different administrations of a test, the test- 


retest method is used less generally than are the other tivo 
Methods 


@) Alternate or Parallel Forms Method 

st hen alternate or parallel forms of a test have been con- 
t “ucted, the correlation between Form A, say, and Form B is 
aken as a measure of the self-correlation of the test. This 
Methoq is employed by the authors of most standard pyscho- 


382 STATISTICS IN PSYCHOLOGY AND EDUCATION 


logical and educational tests, for which alternate forms are usu- 
ally available. : | | : 

The alternate forms method is usually satisfactory if sufficient 
time intervenes between the administration of the two forms 
to weaken or eliminate memory and practice effects. When 
Form B of a test follows Form А very closely, scores on the 
second test will usually be inereased through practice and 
familiarity. When such increases are approximately constant 
(say, three to five points for each score) the reliability coefficient 
of the test will not be affected, since paired А and B scores 
maintain their same relative positions in the two distributions. 
When the mean increase due to practice has been determined, а 
constant amount can be subtracted from Form B scores to make 
them comparable to Form A scores.* In drawing up alternate 
forms of а test, one should be careful to match test materials 
for content, difficulty, and form; but one must be careful not to 
make the test forms too much alike. If alternate forms are 
practically identical, the reliability coefficient of the test will be 
too high; while if parallel forms are not sufficiently “duplicate” 
the reliability coefficient will be too low. 


(3) Тһе Split-half Method 


In the split-half method the test is broken into two equiva- 
lent parts and the correlation of these half-tests is computed. 
From the half-test reliability, the self-correlation of the whole 
test is estimated by the Spearman-Brown formula described on 
page 388. 

'The split-half method is employed when it is not feasible to 
construct an alternate form of the test nor wise to repeat the 
test. This situation occurs with many performance tests, as 
well as with tests and questionnaires dealing with personality 
traits, attitudes, and the like. A performance test (e.g., picture 

* In the Otis Self-Administering Test of Mental Abilities, Higher 
Examination, for instance, the author suggests that when Form B, which 
is slightly more difficult than Form A, is given first, four points be added 
to each score. This is to make scores equivalent to the norms for Form B 


when this test is given after Form А, as it usually is. See Manual of 
Directions, Otis S-A Test (1928), p. 2. 


<% 


NN | 


THE RELIABILITY AND VALIDITY OF TEST SCORES 383 


completion, puzzle solving, form board) is often a very different 
task when repeated, as the child is familiar with procedure and 
content. Likewise, many personality tests cannot be given in 
alternate form nor repeated because of radical changes in the 
subject/s attitude and interests when taking such tests for the 
second time. 

The split-half method is generally regarded as the best of the 
methods for determining test reliability. Perhaps its main ad- 
vantage is that all of the data for determining test reliability 
are obtained upon опе occasion; hence variations introduced by 
differences between the two testing situations are eliminated. 
A disadvantage to the split-half method is that chance errors 
may affect the scores on both halves of the test in the same way, 
thus tending to make the reliability coefficient too high. The 
longer the test, the less the probability that the effects of tem- 
porary and variable disturbanees will be cumulative and in one 
direction, and the more accurate the estimate of reliability. 

Objection has been raised to the split-half method on the 
ground that а test can be divided into two parts in a variety of 
ways so that the reliability coefficient is not a unique value. 
This criticism is strictly true only when items are of equal diffi- 
culty. When items are in strict order of merit from least to most 
difficult, the split into odds and evens gives а unique determi- 
nation of the reliability coefficient. 


(4) The Method of "Rational Equivalence" 

Тһе method of rational equivalence* represents an attempt to 
get an estimate of the reliability of a test, free from the objec- 
tions raised against the methods outlined above. Two forms of 
а test are defined as “equivalent” when corresponding items a, 
A, b, B, etc., are interchangeable; and when the inter-item cor- 
relations are the same for both forms. "Тһе method of rational 


* Kuder, С. F., and Richardson, М. W. “The Theory of the Estima- 


tion of Test Reliability,” Psychometrika, 2 (1937), 151-160. ' 
Richardson, M. W., and Kuder, G. F., “The Calculation of Test Relia- 


bility Coefficients Based upon the Method of Rational Equivalence,” 
Journal of Educational Psychology, 30 (1939), 681-687. 


384 STATISTICS IN PSYCHOLOGY AND EDUCATION 


equivalence stresses the intercorrelations of the items in the test 
and the correlations of the items with the test as a whole. Four 
formulas for determining test reliability have been derived, of 
which the one given below is perhaps the most useful: 


п 0% — Epg тау 
Tt @—1) x 25, (78) 


(reliability coefficient of а test in terms of the difficulty 
and the intercorrelations of test items) 
in which: 
ти = reliability coefficient of whole test; 
т = number of items in the test; 
с, = the SD of the test scores; 
р = the proportion of the group answering a test item cor- 
rectly; 
Ч = (1— p) = the proportion of the group answering a test 
item incorrectly. 
To apply formula (78) the following steps are necessary: 
Step 1 


Compute the SD of the test scores for the w 


hole group, 
namely, с;. 


Step 2 


Find the proportions passing each item 


(p) and the proportions 
failing each item (q). 


Step 3 

Multiply р and 4 for each item and sum for all items. This 
gives Zpq. 
Step 4 

Substitute the caleulated values in formula (78). 

То illustrate, suppose that a test of sixty items has been ad- 


ministered to a group of eighty-five subjects; с, = 8.50 and 
Урд = 12.43. Applying (78) we have 


THE RELIABILITY AND VALIDITY OF TEST SCORES 385 


60.72.25 — 12.43 


т = Tar = .842 
1759 72,25 zi 


which is the reliability coefficient of the test. 

A simple approximation to formula (78) has been devised.* 
This formula is useful to teachers and others who want to de- 
termine quickly the reliability of short objective classroom ex- 
aminations or other tests. It reads: 

_ пах, — M(n — M) 
nU 6% (n— 1) 
[approximation to formula (78)] 


(79) 


in which 

ти = reliability of the whole test; 
n = number of items in the test; 
c, = SD of the test scores; 

M = the mean of the test scores. 


Formula (79) is a labor saver since only the mean, SD and 
number of items in the test need be known in order to get an 
estimate of reliability. The correlation need not be computed 
between alternate forms or between halves of the test. Suppose 
that an objective test of forty multiple-choice items has been 
administered to a small class of students. An item answered 
correctly is scored 1, an item answered incorrectly is scored 0. 
The mean test score is 25.70 and с, = 6.00. What is the reli- 
ability coefficient of the test? Substituting in (79), we have 

40 x 36.00 — 25.70(40 — 25.70) 
fir 36.00 x 39 
= .76 

The assumption is made in formula (79) that all test items 
have the same degree of difficulty, i.e., that the same proportion 
of subjects (but not necessarily the same persons) pass each item. 
Ina power test items are never of equal difficulty. Formula 
(79) will give a satisfactory approximation to the test’s reli- 


* Froelich, С. J., “А Simple Index of Test Reliability," Journal of 
Educational Psychology, 32 (1941), 381-385. 


386 STATISTICS IN PSYCHOLOGY AND EDUCATION 


ability, however, even when the test items cover а wide range of 
difficulty. Formula (79) always underestimates toa slight 
degree the reliability of a test as found by the split-half tech- 
nique and the Spearman-Brown formula, and the more widely 
items vary in difficulty the greater the underestimation. This 
formula provides a minimum estimate of reliability — we may 
feel sure that the test is at least as reliable as we h 
to be by (79). 

Formulas (78) and (79) are not strictly comparable to the 
three methods for determining the reliability of test scores given 
above. In a sense, these formulas provide an estimate of the 
internal consistency of the test rather than an estimate of the 
dependability of test scores. The method of rational equiva- 
lence is superior to the split-half technique in certain theoretical 
aspects, but differences in reliability as found by the two methods 
are never very large (of the order :02, ete.) Formula (79) is 
often to be preferred to the split-half method because of the 
time and calculation it saves rather than for other reasons. 


ave found it 


2. Factors Influencing the Reliabili 
and Constant Errors 

Many factors affect the reliabilit 
in interest and attention, shifts in emotional attitude, and the 
differential effects of memory and practice. То these “рвусһо- 
logical" factors must be added environmental disturbances 
such as distractions, noises, interruptions, errors in Scoring, and 
the like. АП of these variable influences (environmental and 
psychological) are subsumed under the head “chance errors.” 
Errors, to be truly “chance,” must influence a score in such a 
way as to cause it to vary above — as often as below — its 
"true" value. Тһе reliability coefficient is а quantitative es- 
timate of the importance of chance or variable influences upon 
test scores. 

Constant errors, as distinguished from chance errors, work 
in only one direction. Constant errors may raise or lower all 
of the scores on a retest or on the alternate forms of the test, 


ty of Test Scores: Chance 


y of a test besides fluctuations 


re 


TIE RELIABILITY AND VALIDITY OF TEST SCORES 387 


but will not affect the reliability coefficient. If every paper on 
Form B of a test is scored 5 points too high, for example, the 
self-correlation of the test will not be affected (1.е., the correla- 
tion between Form A and Form B) but all of the scores on the 
Second form will be in error by 5 points. 

How high should the self-correlation of a test be in order for 
the reliability of the test to be considered satisfactory? This 
is an important question, and its answer depends upon the 
nature of the test, the size and variability of the group tested, 
and the purpose for which the test was given. To distinguish 
reliably between the means of two relatively small groups of 
narrow range of ability (for example, a fifth grade and a sixth 
grade) a reliability coefficient need be no higher than .50 or .60. 
If the test is to be used to differentiate among the individuals 
in the group, however, its reliability should be .90 or more. 
Most of the authors of intelligence'tests and educational achieve- 
ment examinations report correlations of .90 or more between 
alternate forms of their tests. Since the self-correlation of a 
test is directly affected by the variability within the group, in 
reporting a test’s reliability coefficient the standard deviation 
of the group should always be given. 


3. The Effect upon Reliability of Lengthening or Repeating a 
Test 
(1) The Reliability of Coefficient from Many Applications or 
Repetitions of a Given Test 
Тһе mean of five determinations of height will, in general, 
be more reliable than a single determination (p. 183), and 
the mean of ten determinations will (in general) be more reliable 
than the mean of five. On the same principle, increasing the 
length of the test, or averaging the results obtained from several 
Applications of the test, or from alternate forms, will tend to 
increase reliability. If the self-correlation of a test is not satis- 
factory what will be the effect of doubling or tripling the test’s 
length? To answer this question experimentally would require 
Considerable time and labor. Fortunately, a good measure of 


388 STATISTICS IN PSYCHOLOGY AND EDUCATION 


the effect of lengthening or repeating a test may be obtained 
from the Spearman-Brown “prophecy formula": 


А ES TW 
"C l1TQG-1 r, 


(Spearman-Brown formula for estimating the correlation 
between n forms of a test, and n other similar forms) 


(80) 


in which 
Тап = the correlation between n forms of a test and n alternate 
forms (or the mean of n forms against the mean of n other 
forms); 
Tu = the reliability coefficient. 


The subscripts (“11”) show that 
forras of the same test. 

To illustrate the use of formula (80) suppose that in a group 
of 100 adults the self-correlation of a test is .70. What will be 
the effect upon test reliability of tripling the length of the test? 
Substituting ти = .70 and n = 3 in formula (80) and solving for 
Tan, We have 


the correlation is between two 


-.2Х40 — 210 — - 
1+2х.70 240^: 


Tripling the test’s length, therefore, increases its reliability co- 
efficient from .70 to .88. Instead of tripling the length of the 
test we could give three parallel forms of the test and average 
the three scores made by each person. The reliability of these 
mean scores (each based upon three measures) will be the same, 
as far as purely statistical factors are concerned, as the reli- 
ability got by tripling the length of the test, 

The prophecy formula may also be used to find how many 
times a test should be repeated jn order for test; scores to reach 
a given standard of reliability. Suppose that the self-correlation 
of a test is .80. How much will the test have to be lengthened, 
or how many times repeated, in order to insure a reliability 
coefficient of .95? Substituting ти = .80 and Tan = .95 in the 
formula, and solving for n, we have 


Тап 


anum 


THE RELIABILITY AND VALIDITY OF TEST SCORES 389 


= 80n _ .80n 
1+ .80n —.80 .20--.80п 


.95 


and 
n = 4.75 or 5 in whole numbers 


The test must be five times its present length, therefore, or five 
alternate forms must be given and averaged, before the self- 
correlation of the test will reach .95. 

Predictions of test reliability by the Spearman-Brown for- 
mula are valid only when the items or questions added to the 
test cover the same ground, are of equal range of difficulty, and 
are comparable in other respects to the items of the original test. 
When these conditions are satisfied, there would appear to be 
по reason, as far as the mathematical process is concerned, why 
We could not boost the self-correlation of a test to any desired 
figure, simply by continuing to increase its length or by con- 
tinuing to repeat it. But it is highly improbable that the re- 
liability coefficient of a test could be so increased indefinitely. 
In the first place, it is impracticable if not impossible to increase 
а test’s length, say, ten or fifteen times. Furthermore, beyond 
& certain point, boredom, fatigue, loss of incentive, and the like 
inevitably affect our results and lead to “diminishing returns." 
When the material added to the test is strictly comparable to 
the original test items, and when motivation remains substan- 
tially constant, the experimental evidence* indicates that a test 
May be increased to six or seven times its original length, and 
the Spearman-Brown formula will still give a close estimate of 
empirically determined results. But after the first four or five 
lengthenings the prophecy formula may “over-predict” — give 
higher estimated reliabilities than those obtained by actual cal- 
culation. This is not an especially serious drawback, however, 
38 а test which needs so much lengthening in order to yield 

* Holzinger, K. J., and Clayton, B., “Further Experiments in the 


Application of Spearman’s Prophecy Formula,” Journal of Educational 
Sychology, 16 (1925), 289-299. 
Ruch G. M., Ackerson, Luton, and Jackson, J. D., *An Empirical 
Study of the Spearman-Brown Formula as Applied to Educational Test, 
aterial,” Journal of Educational Psychology, 17 (1926), 309-313. 


390 STATISTICS IN PSYCHOLOGY AND EDUCATION 


reliable results should be radically changed in form or content, 
or better still, perhaps, discarded in favor of another test. 

Тһе Spearman-Brown formula may be applied to ratings, 
judgments, and other estimates as well as to test items. When 
measuring the reliability of а personality rating scale, for in- 
stance, by correlating the ratings made by two equally com- 
petent judges, we may employ the prophecy formula to estimate 
the increased reliability which might be expected if there were 
four, six or more judges.* 


(2) The Reliability Coefficient from One Applic: 

When a test has no alternate form and cannot well be re- 
peated, we may calculate the reliability of half of the test and 
then proceed to estimate the reliabilit: 


y of the whole test by the 
Spearman-Brown formula. This method is called the “split- 


half technique” (p. 382). The Procedure is to make up two sets 
of scores by combining, say, alternate exercises or items in the 
test. The first set of scores represents, for example, performance 
on the odd-numbered items, 1, 3, 5, 7, ete.; and the second set 
of scores performance on the even-numbered items, 2, 4, 6, 8, 
ete. Other ways of making the two halves of the test as com- 
parable as possible in content, diffieulty, and susceptibility to 
practice may be employed, but the method described is the one 
most commonly used. From the self-correlation of the half test, 


the reliability coefficient of the Whole test may be estimated from 
the formula 


ation of a Test 


Әт) А 
ти = 2 
п T= F "i (81) 
(Spearman-Brown formula for estimating reliability 
. from two comparable halves of a test) 
in which 
ти = the reliability coefficient of the wh 
Tja = the reliability coefficient of one- 
experimentally. 


ole test; 
half of the test, found 


3 : Applied to Ratings of 
Personality Traits," Journal of Educational Psycholoy], 26 (1935), 5522555. 

Remmers, Н. H., Shock, N. W., and Kelly, E. T, САД Empirical Study 
of the Validity of the Spearman-Brown Formul 


а аз Applied to the Purdue 
Rating Scale," Journal of Educational Psychology, 18 (1927), 187-195. 


ў 
\ 


ТНЕ RELIABILITY AND VALIDITY OF TEST SCORES 391 


When the reliability coefficient of one-half of a test (тл) is .60 
` it follows from formula (81) that the reliability of the whole 
test (ry) is .75. 


4. The Index of Reliability 

An individual's “true score” on a test (p. 181) is defined as 
the mean of a very large number of determinations made of the 
given person on the same test or parallel forms of the test 
administered under approximately identical conditions. ‘The 
Correlation between a series of obtained scores and their 
Corresponding theoretically “true” scores may be found by the 


formula 
Тю = Мт (82) 


(correlation between obtained scores on a given test and 
z true scores in the function measured by the test) 
In which 
Ти = the reliability coefficient of the given test; 
Те = the correlation between obtained and true scores. 


Тһе symbol “со” (infinity) designates “true scores,” that is, 
Scores obtained from an “infinite” number of administrations 
of the test to the same group. 

The coefficient у= is called the index of reliability; it measures 
the trustworthiness of test scores by showing how well obtained 
Scores agree with their theoretically true counterparts. The 
index of reliability gives the maximum correlation which the 
given test is capable of yielding. This follows from the fact that 
“the highest possible correlation which can be obtained (except 
as chance might occasionally lead to higher spurious correlation) 
between a test and a second measure is with that which truly 
represents what the test actually measures, that is, the correla- 
tion between the test and the true scores of individuals in just 
Such tests, 7% 

To illustrate the application of the index of reliability, sup- 
Pose that for a given test the self-correlation is .64. Then 


R * Kelley, T. L., “Тһе Reliability of Test Scores," Journal of Educational 
esearch, З (1921), 327. 


392 STATISTICS IN PSYCHOLOGY AND EDUCATION 
Tis = V.64 or .80; and .80 is the highest correlation of which 
= = М. i 


this test is capable, since it represents the relationship between . 


obtained test scores and true test scores in the same umet. 
If the self-correlation of a test is only .25, so that ль c 25 
or .50, it is obviously a waste of time to continue using this test 
without lengthening or otherwise improving it. A test whose 
index of reliability is only .50 is an extremely poor estimate of 
the function which it is trying to measure. 


5. The Standard Error of an Obtained Score 
'The effects of variable or chance errors in producing diver- 
gencies of obtained scores from their true counterparts may be 
estimated by the formula 
Oro = 01V 1l — rg (83) 
(standard error of an obtained score) 
in which 
т = the standard error of an Obtained score (sometimes 
called the “standard error of measurement") 
бі = the standard deviation of the test Scores; 
ти = the reliability coefficient of the test. 


? 


The subscript “4,” indicates this standard deviation to be & 
measure of the error made in taking an obtained score (i.e., 1) 
as an estimate of the true score (i.e., оо). To illustrate the use of 
01» Suppose that in a group of 300 college freshmen the relia- 
bility coefficient of an aptitude test in mathematics is .92 and 
the SD of this distribution is 15.00. From formula (83) we have 


© = 15У 1 — .92 = 4.2 or 4 in whole numbers 


and the odds are 2:1 that the obtained score made by any m- 
dividual in the group does not differ from his true score by more 
than + 4 points. If subject AB has a score of 85, we may feel 
confident (the chances are .95) that his score “actually” lies 
between 77 and 93 (+ 1.96 x 4.2).* Generalizing for the en- 
tire group, we should expect about two-thirds of the 300 scores 


* See page 185. 


4 


THE RELIABILITY AND VALIDITY OF TEST SCORES 393 


to be in error by - points or less; the other one-third (or 100) to 
be in error by more than 4 points. 

The reader should note carefully the difference between Gest.) 
(see р. 320) and т. The first formula enables us to say with 
what degree of assurance we can predict an individual's score 
on one test when we know his score on a second (and usually a 
different) test. The actual prediction of the most probable 
score is made, of course, by way of the regression equation con- 
necting the two variables (p. 317). The SE of an obtained 
score, Gis, is also an estimate formula; it tells us how ade- 
quately an obtained score represents the true score. Although 
the true score is unknown, we can, nevertheless, tell from gi» 
how much our obtained score probably misses the true value. 
The SZ of an obtained score is the best method of expressing the 
reliability of a test, since it takes account of the self-correlation 
of the test as well as of the variability within the group. 

Formula (83) provides a general estimate of the SE of any 
score over the entire range of the test. When the range is wide, 
the agreement of scores on two forms of the test may differ con- 
Siderably at successive parts of the scale. To refine our estimate 
of the reliability of our test scores, we may compute біе for 
different levels of achievement. This has been done for the new 
Stanford-Binet; the т for I.Q.'s 130 and above, for example, 
is 5.24, for 1.Q.’s 90-109, 4.51, for I.Q.'s 70 and below, 2.21, etc. 
Тһе method is described in the references given below.* 


6. The Dependence of the Reliability Coefficient upon the Size 

and Variability of the Group 
The reliability coefficient of a test administered to a small 
group (a single grade, say), cannot be compared directly with 
the reliability coefficient of the same test administered to a larger 
group, e.g., to the children in several grades. The self-correla- 
* Terman, L. M., and Merrill, M. A., Measuring Intelligence (1937), 


P. 46. 

. MeNemar, Quinn, “The Expected 
viduals Paired at Random,” Journal 
438—439, 


Average Difference between Indi- 
of Genetic Psychology, 43 (1933), 


394 STATISTICS IN PSYCHOLOGY AND EDUCATION 


tion of a test (like any correlation coefficient) is affected by the 
variability of the group; and the larger and more heterogeneous 
the group, the greater test variability tends to Бе. If we know 
the self-correlation of a test in a narrow range (ordinarily a 
small group) we can estimate the self-correlation of the same 
test in an increased range (ordinarily а larger group) by the 


formula 
0€, У1— 7и 
Gs) hr (84) 
т у= т, 


(relation between o’s and reliability coefficients ob- 
tained in different ranges when the test is equally 
effective throughout both ranges) 
in which 
с, and с: = the o’s of the test scores in the small and large 
groups, respectively ; 
Та and ти = the reliability coefficients іп the small and large 
groups. 

To illustrate the use of formula (84) suppose that for a single 
fifth grade, fs = .50, and с, = 5.00; and that for a larger group 
made up of children from grades three to seven, с; = 15.00. 
Assuming our test to be as effective in the large group as in the 
small, what is the reliability coefficient of the test in the large 
group? If we substitute for о’, от and r, in formula (84), ru = 
94. This means that a reliability coefficient of .50 in the small 
group indicates as high a degree of test consistency as a relia- 


bility coefficient of .94 in a group in which the score range is 
three times as wide. 


IL Tur VALIDITY or Trsr SCORES 


The validity of a test, or of any measuring instrument, 
depends upon the fidelity with which it measures whatever 16 
purports to measure. A homemade yardstick is valid when 
measurements made by it are proved to be accurate by standard 
measuring rods. And in the same way a test is valid when tbe 
capacity which it gauges corresponds to the same capacity 89 


—!!— 


THE RELIABILITY AND VALIDITY OF TEST SCORES 395 


otherwise objectively measured and defined. The difference 
between validity and reliability сап be made clear, perhaps, by 
an illustration. Suppose a clock is set forward twenty minutes. 
lf the clock is a good timepiece, the time it “tells” will be reliable 
(1.е., consistent), but it will not be valid as judged by “standard 
time." The reliability of the measurements made by scales, 
thermometers, yardsticks, chronoscopes, clocks, ete., is deter- 
mined by making repeated measurements of the same facts; 
and validity is determined by comparing the measures returned 
by the given instrument with highly precise (if arbitrary) 
"standard" measures. The reliability of mental measures is 
found in the same way. But since precise and independent 
"standards" (criteria) are rarely found in mental measurement, 
the validity of a test can never be estimated as precisely as can 
the validity of a thermometer or a rheostat. 


1. The Determination of Validity through Correlation with a 
Criterion 

The validity of a test is determined directly, whenever pos- 
sible, by finding the correlation between the test and some in- 
dependent criterion. А eriterion is an objective measure in 
terms of which the value of the test is estimated or judged. 
Тһе criteria for evaluating а general intelligence examination, 
for example, may be school marks, ratings for aptitude in learn- 
ing, or some other test believed to be valid, such as Stanford- 
Binet. A trade test may be validated against demonstrated 
ability to carry on the required operations as shown in actual 
Performance.* А high correlation between a test and a criterion 
is evidence of validity provided the test and the criterion are 
both reliable. But before accepting criterion correlations, we 
Must know the reliability of the test and if possible the relia- 
bility of the criterion. 

When a criterion is not immediately available, indirect 
Methods may be utilized for estimating the validity of a test. 


* Stead, W. H., and Shartle, C. L., Occupational Counseling Techniques 
(1949), Chapters 5 and 8 especially. 


396 STATISTICS IN PSYCHOLOGY AND EDUCATION 


We may, for example, compute the average correlation which 
Ц 


each test in a battery shows with all of the other tests, and ез-. 


timate the validity (i.e., the representativeness) of each test by 
thesize of its correlations. Again, following essentially the same 
method, we may combine the scores on a number of tests de- 
signed to measure the same function (memory, say), and con- 
sider as most valid that test which correlates highest with the 
average of them all. Anastasi,* for example, found that of 
eight tests of immediate memory, the paired-associates test 
(geometric form paired against numbers) had the largest average 
correlation (i.e., .49), with the other tests of the battery. This 


test, then, is the most valid measure of the function tapped in 
common by all of the tests. 


2. The Correction for Attenuation 


Тһе correlation between a test and its criterion will be reduced 
if either the test scores or the criterion scores or both are un- 
reliable. In order to estimate the correlation between true 
Scores in two variables, we need to make a correction which will 
take account of the unreliability in both sets of measures. Such 
а correction is given by the formula 


Te 
ба И (85) 
Уту X Torr 
(correlaiion between. true measures in Tests 1 and 2) 
in which 
7 


"44 = correlation between true scores in Tests 1 and 2 Н 
Та = correlation between obtained scores in Tests 1 and 2; 
Tu = reliability coefficient of Test 1; + 


Тп = reliability coefficient of Test 2. 


Formula (85) is the well-known correction for attenuation 
formula. It provides a correction for the effects of those chance 
or accidental errors in the two tests which lower the reliability 


* Anastasi, A., A Group Factor in Immediate Memory, Archives of 
Psychology, No. 120 (1930), p. 41. 


м 


THE RELIABILITY AND VALIDITY OF TEST SCORES 397 


coefficients of both tests and thus affect the correlation between 
them. То illustrate the application of formula (85), let the 
obtained correlation between two tests А and В be .60, the 
reliability coefficient of Test А be .80 (ги) and the reliability со- 
efficient of Test B be .90 (љи). What is the correlation between 
Tests A and В freed of chance errors? Substituting the given 
values in formula (85), we have 

diu Беке Исса 

"^ 4/50х.90 
аз the estimated correlation between true scores in А and B. 
Our corrected coefficient of correlation represents the relation- 
ship which we should expect to obtain if our two sets of test 
Scores were perfect measurements. 

It is clear from formula (85) that correcting for chance errors 
will always raise the correlation between two tests — unless the 
reliability coefficients are both 1.00. Chance errors, therefore, 
always lower or attenuate an obtained correlation coefficient. 
The expression Мт X ren sets an upper limit to the correlation 
Which we can obtain between two tests as they stand. Іп the 
example above, V.80 X .90 = .85; hence, Tests А and В cannot 
correlate higher than .85, as otherwise their corrected r would 


be greater than 1.00. 

Let us assume the correlation between first year college grades 
апа a general intelligence test to be .46; the reliability of the 
intelligence test to be .82; and the reliability of college grades 
to be .70. The maximum correlation which we could hope to 


40 M 
obtain between these two measures is 27705-82 60. Know- 


ing that the correlation between grades and general intelligence, 
corrected for errors of measurement, has a probable maximum 
value of .60 gives us a better notion of the "intrinsic" relation- 
ship between the two variables. At the same time, the inves- 
tigator should remember that the 7,4 of .60 is a theoretical, 
hot an obtained, value; that it gives an estimate of the relation- 
ship to be expected when the tests are more effective than they 


398 STATISTICS IN PSYCHOLOGY AND EDUCATION 


actually were in the present instance. If many sources of error 
are present so that considerable correction is necessary, it would 
be better experimental technique to improve the tests and the 
experimental conditions than to correct the obtained r. E 
The investigator must be careful how he applies formula (85) 
to correlations which have been averaged, as in such cases the 
reliability coefficients may be lower than the correlations be- 
tween the two tests. When this happens r,, is greater than 
1.00. Such a result is logically and psychologically meaningless. 
If a corrected r is 1.00, or is only slightly greater than 1.00, 
however, it may be taken as indicating complete agreement 
between the two variables within the error of computation. 


3. The Estimation of the True c of a Test 

Chanee or variable errors have 
standard deviation of a test 
tests. The relation of the c c 
test to the с of true scores 
formula 


а marked effect upon the 
; as well as upon the r between 
alculated from obtained scores on & 
on the same test is given by the 


0, = OV ry (86) 
(relation between true and obtained as for a set of lest scores) 
in which 
Ta = the c of the true test scores; 
бі = the с of the obtained test scores; 
"y = the reliability coefficient of the test. 
Suppose an educational achi 
has been administered to 
Standard deviation, от, is 


evement test of seventy-five items 
а group of fifty children. The obtained 
10, and the reliability coefficient of the 
test (ти) is .50. What is Ta, the o of the true scores from which 
variable or accidental errors have been eliminated? Substi- 
tuting б; = 10, and ги = .50 in formula (86) 


с, = 10/50 
= 7.1 
and the “true т” of the test is about 7 points. 
It is clear from (86) that 7, Will always be smaller than 95 


411: 


THE RELIABILITY AND VALIDITY OF TEST SCORES 399 


except in the improbable ease in which ги = 1.00. Тһе effect 
of chance errors of measurement, then, is always to increase the 
Spread (ету) of obtained test scores or of criterion scores. 


4. Validation of a Test Battery* 

А criterion of job efficiency, say, or of success in salesman- 
Ship may be forecast by a battery consisting of four, five, or 
more tests. The validity of such a battery is determined by the 
multiple correlation coefficient, R, between the battery and the 
criterion. The weights to be attached to scores on the sab-tests 
of the battery are given directly by the regression coefficients 
(p. 421). ; 

И the regression weights are small fractions (as they often 
аге) whole numbers may be substituted for them with little 
if any loss in accuracy. For example, suppose that the regres- 
sion equation joining the criterion and the tests in a battery 
reads as follows: 

C (criterion) = 4.32Х, + 3.12Xs — .65Х + 8.35Xi + К 

(constant) 
propping fractions and taking the nearest whole numbers, we 

ave 
C =4X,+ 3X2 — 1Х + 8X4 + K 

Scores in Test 1 should be multiplied by 4, scores in Test 2 by 3, 
Scores in Test 3 by — 1, and scores in Test 4 by 8, in order to 
Provide the best forecast of C, the criterion. The fact that Test 
3 hasa negative weight does not mean that this test has no value 
in forecasting C, but simply that the best estimate of C is ob- 
tained by giving scores in Test 3 a negative value. 


ТП. Irem ANALYSIS 
In Section II above, we considered the validity of final test 
Scores. The validity of a test score also depends directly upon 
the care with which the items in the test have been chosen. 
While the subject of item analysis properly belongs in a book on 
* See Chapter XIII. 


400 STATISTICS ІХ PSYCHOLOGY AND EDUCATION 


test construction, the main features of the process may be out- 
lined here. Item analysis may be divided into three main 
topics: (1) item selection, (2) item difficulty, and (3) item va- 
lidity. 


1. Item Selection 


The initial choice of test items depends upon the judgment 
of competent persons as to the suitability of the material for the 
purposes of the test. Certain types of items, for instance, have 
proved to be generally useful in intelligence examinations. 
Problems in mental arithmetic, for example, vocabulary, anal- 
ogies, and number series completion, are often encountered; 
also, items requiring generalization, interpretation and the 
ability to see relations. The validity of most standard tests of 
educational achievement depends upon the consensus of teachers 
and other competent judges as to the adequacy of the items in- 
cluded. Courses of study, requirements for different grades, 
curricula from different sections of the country are carefully 
culled over by the test makers to determine what material in 
history, English, geography, ete., should be included in an edu- 
cational achievement battery designed, say, for the seventh 
grade. In its final form the educational achievement test repre- 
sents items carefully selected from all available sources of in- 
formation. 

Items used in personal data sheets, interest inventories, atti- 
tude scales and the like, also represent a consensus of experts 
as to the most diagnostic items in the areas sampled. 


2. Item Difficulty 


The difficulty of an item is determined by the proportion of 
some standard group able to solve the item correctly. The scal- 
ing of separate test items has been described in Chapter vi, 
page 146. When normality of distribution can be assumed for 
the ability being measured, single items or groups of items 
(scores) may be scaled, i.e., given difficulty values along а seale 
in terms of с. It has been customary to select items for а test 


|= 


THE RELIABILITY AND VALIDITY OF TEST SCORES 401 


which vary in difficulty from easy to hard. The average person 
in the standardization group will then pass about one-half (50%) 
of the items in the test. It can be shown, however, that the 
sharpest discrimination as between good and poor subjects is 
provided by items which are passed by 50% of the members 
of a group. A test made up of items all of which are passed by 
approximately 50% (but by different persons, of course) would 
theoretically be the most discriminating test. But it would be 
difficult to construct such an examination and it is probable 
that a test made up of items covering a wider range of difficulty 
is psychologically a better measuring device. In standardizing 
а test care must be taken that few, if any, subjects achieve per- 
fect or zero scores, as in neither case is the person measured by 
the test. 


3. Item Validity 

An often-used method of validating a test item is to determine 
Whether the item diseriminates between subjects differing 
Sharply in the function being measured. This “criterion of in- 
ternal consistency " admits into the final test or questionnaire 
only those items which have been found to separate high-scoring 
and low-scoring members of the group. In an internally con- 
sistent test, items “hang together" in the sense that they work 
in the same direction and measure the same common trait.* 
In one study,} eighty-six items were selected out of 222 on the 
basis of their ability to discriminate among the lower, middle, 
and upper thirds of the group. These eighty-six “good” items 
did a better job (higher reliability and validity) than a test 
nearly three times longer. 

The validity of a single test item may also be determined by 
finding its correlation with total scores in the test of which it is 
а part, or by finding its correlation with scores in some inde- 

s Ferguson, С. A., “Тһе Factorial Interpretation of Test Difficulty," 


Psychometrik -32 à 
metrika, 6 (1941), 323-329. f Item Analysis upon the Discrimina- 


" Anderson, J. E., “Тһе Effect о! Е 
aye Power of an Examination,” Journal of Applied Psychology, 19 (1935), 
—244. 


402 STATISTICS IN PSYCHOLOGY AND EDUCATION 


pendent criterion. Тһе bi-serial method (p. 347) is the standard 


procedure for determining item validity through correlation. 


Application of bi-serial r to each item in a test requires consider- 
able computation, however. For this reason various short-cut 
methods for selecting good items by formula and by graphical 
methods have been devised. References given below should be 
consulted.* 


PROBLEMS 
1. The reliability coefficient of a test is .60. 


(а) How much must this test be lengthened in order to raise the 
self-correlation to .90? 


(b) What effect will doubling the test’s length have upon its reli- 
ability coefficient? tripling the test’s length? 


2. A test of fifty items has a reliability coefficient of .78. What is the 
reliability coefficient 


(а) of a test having 100 items comparable to the items in the given 
test? 
(b) of a test having 125 comparable items? 
. A given test has a reliability coefficient of .80 and a c of 20. 


(a) What is the maximum correlation which this test is capable of 
yielding as it stands (see p. 391)? 
(b) What is the standard error of a score obtained on this test? 


(c) What is the estimated reliability coefficient of this test in а 
group in which the c is 15? 


‚ А test of 100 items is given to a group of 225 subjects with the fol- 
lowing results: M = 62.50; с = 9.62. 


(а) What is the reliability coefficient of the test by formula. (79)? 
(b) What is the estimated true с of this test? 


(c) What is the standard error of a score on this test? 


* Long, John А., and Sandiford, Peter, The Validation of Test Items, Bul- 
letin 3, 1935, University of Toronto, Department of Educational Research- 

Flanagan, J. C., General Considerations in the Selection of Test Items, 
Journal of Educational Psychology, 30 (1939), 674—680. 

Guilford, J. P., The Phi-coefficient and Chi-square as Indices of Item 
Validity, Psychometrika, 6 (1941), 11-19. A 

Richardson, M. W., and Adkins, D. C., A Rapid Method of Selecting 
Test Items, Journal of Educational Psychology, 29 (1928), 547-552. Е 

Hawkes, Н. E., Lindquist, E. R., and Mann, C. R., Achievement Ezam 
inations, 1936, Chaps. 2 and 3 especially. 


al 


or 


мч 


THE RELIABILITY AND VALIDITY OF TEST SCORES 403 


+ Show (a) that when the relia ability coefficient is zero, the standard 
error of an obtained score equals the standard deviation of the test; 
und (b) that when the reliability coefficient is 1.00, the standard 
error of an obtained score equals zero. 

- A mathematics test has a reli: ability coefficient of .82, and a mechan- 
ical ability test has а relis ability coefficient of .76. "The r between 
the two tests is .52. 

(а) What would the correlation be if both tests were perfect meas- 
ures? 

(b) What is the maximum correlation possible with the mathe- 
matics test as it stands? 

(c) What is the maximum correlation possible with the mechanical 
ability test as it stands? " 

+ An intelligence examination shows a correlation of .50 with first- 
year scholarship. The reliability coefficient of the test is .85, and 
of school grades (i.e., the criterion) is .65. What is the highest 
validity coefficient which we can hope to get with this test (i.e., 
corrected correlation between test and grades)? 

- A test of seventy-five items has а с; of 12.35. Тһе Ура = 16.46. 
What is the reliability coefficient by formula (78)? 


ANSWERS 


- (a) six times О 

(b) rq = .75 (doubling length); ги= .82 (tripling length) 
- (a) .88 
(b) .90 
- (a) .89 
(b) 8.9 
(c) .64 
* (а) 75 
(b) 8.34 
(c) 4.81 
“ (а) .66 
(b) .91 
(с) .87 
.68 
.90 


CHAPTER XIII 
PARTIAL AND MULTIPLE CORRELATION 


1. Тн MEANING OF PARTIAL AND MULTIPLE CORRELATION 


Partar and multiple correlation represent an important exten- 
sion of the theory and technique of simple or two-variable cor- 
relation to problems which involve three or more variables. In 
computing the correlation between two sets of scores, it is often 
desirable to allow for the influence of factors which through their 
common relationship to the variables being correlated obscure 
results or make them difficult to interpret. To illustrate, sup- 
pose that the correlation between intelligence test scores and 
chronological age in a large group of children, seven to fourteen 
years old, is .50; that the correlation between school achieve- 
ment and age in the same group is .40; and that the correlation 
between intelligence and school achievement is .70. Since in- 
telligence test scores and school achievement both increase with 
age (the correlations are .50 and .40) the correlation between 
these two measures will be raised when age is allowed to vary- 
The correlation coefficient of .70, therefore, is not only a measure 
of the role of intelligence in school achievement, but is a measure 
of the influence of intelligence plus the indirect effects of differ- 
ences in age or maturity upon school achievement. 

To discover the relationship between intelligence and school 
achievement, uninfluenced by maturity, we must rule out ОГ 
control the factor of age. This could be accomplished experi- 
mentally by selecting children all of whom are of the same age. 
But this procedure offers many difficulties, the principal one 
being that it is well-nigh impossible to find a large sample of 
children of exactly the same age. It becomes necessary, then, 


to determine what age range is permissible; and the mor? 
404 


PARTIAL AND MULTIPLE CORRELATION 405 


closely we limit our group with respect to age, the smaller the 
number left. In fact, the experimental control of a variable by 
the method of selection may so limit the size of the group that 
correlations are of doubtful value. 

Because of the difficulties which arise in attempting to con- 
trol a variable (or variables) experimentally, the method of 
partial correlation is often employed. By this method the rela- 
tionship between two variables can be determined when one or 
more related variables are held constant. Thus, the partial cor- 
relation between general intelligence and school achievement, 
i.e., the correlation with age “‘partialled out,” gives us the cor- 
relation between these two variables uninfluenced by the factor 
of age differences. Such a partial coefficient represents the тей 
correlation between general intelligence and school achievement 
for children of the same age; or the net correlation between in- 
telligence and school achievement when age is a constant factor. 
Expressed in still another way, our partial coefficient tells us 
what relationship exists between general intelligence test scores 
and school achievement when differences in maturity no longer 
affect either variable. 

A second illustration of partial correlation may be helpful. 
A teacher finds in her class a correlation of .60 between test 
Scores in history and arithmetic. In looking for 88 explanation 
of this correlation (since there is apparently little reagon to 
expect a high relationship between these two abilities), she finds 
that achievement in arithmetic seems to depend іп part upon 
ability to read and understand the problems. Obviously, ability 
to read well is also an important factor in determining achieve- 
Ment in history, Suppose that our teacher now calculates the 
Correlations at this history and arithmetic tests with a third test 
of reading comprehension. Knowing these 7's, she may deter- 
Mine (by methods given on p. 414) the net or pasti Кота 
between history and arithmetic when ава = Pme 
Comprehension have been allowed for. If йш ee coe dese 
із 30, say — considerably smaller than the “whole” coeflicien 
(of .60) between history and arithmetic — the hypothesis that 


406 STATISTICS IN PSYCHOLOGY AND EDUCATION 


the apparent relationship was due in part to the common de- 
pendence of both tests upon reading is verified. When a factor 
(or factors) is *partialled out” from a given correlation the 
effect is to eliminate the differences among individuals intro- 
duced by the variable thus controlled. The method of eliminat- 
ing factor variability through partial correlation may be em- 
ployed whenever the correlation can be computed between the 
factor or factors to be controlled and the two variables the net 
correlation of which we are seeking. Since all of the data are 
utilized, partial correlation has a decided advantage over experi- 
mental control in many problems. 

In addition to its value as a means of controlling conditions 
by eliminating the effects of “disturbing” or other variables, 
partial correlation is useful in other ways. It enables us, for 
example, to build up a regression equation involving three or 
more variables from which a “criterion ” score may be predieted 
when we know the scores made by a subject on several correlated 
tests. The accuracy of the regression equation in estimating 
criterion scores — its reliability as а “ prediction" instrument — 
can be determined by the multiple coefficient of correlation. А 
multiple correlation coefficient gives the correlation between & 
single test or criterion on the one hand and a team of tests on 
the other. The meaning of the multiple coefficient of correla- 
tion will be better understood when the student has worked 
through an actual problem such as that given in Table 59. 


П. AN ILLUSTRATIVE CORRELATION PROBLEM 
INVOLVING THREE VARIABLES 


Perhaps the most straightforward approach to an under- 
standing of the meaning of partial and multiple correlation, and 
of the techniques of calculation involved, is through the solu- 
tion of a problem. The present section, therefore, will show the 
application of partial and multiple correlation to a three-vari- 
able problem. Following this, the general formulas and further 
applications of the method will be considered. 


PARTIAL AND MULTIPLE CORRELATION 407 


TABLE 59 
A CORRELATION PROBLEM INVOLVING THREE VARIABLES 
(To illustrate partial and multiple correlation) 


Step 1. Primary Data (N — 450) 


- Pointe (2) General Intelli- (3) Average Hours 
0) Honor Points gence of Study per Week 
M, = 18:5 Min ot 
оу = 11.2 өз 6 
тә = .60 тз = — .35 
Step 2. Calculation of Partial Coefficients of Correlation 
.60 — .32(— .35) _ 
== раҳ 0867 7-9 (87) 
_ 32 — .60(— 35) _ 5, - 


.8000 X .9367 


2 (—39-.00x.32 _ _ _ 
= —390x.97i - 72 (87) 


XQ = bste + bises (Deviation Form) (89) 
me №: = busXs + bis2Xs + K (Score Form) (90) 
i 2 91.1: 
In which bis = тоз m and bisa = maa (98) 


Step 4. Calculation of the Partial c's 
(1) es = e; V1 = УТ — ria = 
М1 — res 


11.2 X .8000 X .7042 = 6.3 (88) 
15.8 X .9367 X .6000 = 8.9 (88) 
6 х .9367 X .7042 = 4.0 (88) 


(2) C243 = 
(3) оз = c; Vl — Pa VI = rusa = 
Step 5. Calculation of the Partial Regression Coefficients, and Partial 
egression Equation 
Substituting for ris, 713.2) 01.93) бала, 0312, WE age 
6.3 — 57; bas = Е X Go = 112 


bis = .80 X 89^ 


Hence th i Е s: 

e regression equation becomes: | 
тұ = rs + 1.122; (Deviation Form) 

= X= 57X: + 1.12X; — 66 (Score Form) 


жер 6. Calculation of the Standard Error of Estimate ч 
G(est.X) = 01.33 = 6.3 (96) 


Step т, Calculation of the Coefficient of Multiple Correlation 


(98) 


408 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Тһе problem in Table 59 is taken from a study* of the factors 
which influence “academic success." In that part of the study 
from which the present data are drawn, the problem was to dis- 
cover how accurately one can predict the academic success of 
freshmen from a knowledge of their general intelligence and of 
their study habits. Academic success was defined specifically 
as the number of credit or “honor” points obtained by a stu- 
dent at the end of his first semester in college. The number 
of honor points earned depended upon the number of A, B, and 
C grades made by the student in his freshman courses. A grade 
of A carried three honor points; a grade of B two honor points; 
a grade of C one honor point; and a grade of D, which was à 
passing mark, carried no honor point credit. Тһе maximum 
number of points which a freshman taking the regulation num- 
ber of courses in one semester could obtain was forty-eight. 

General intelligence was measured by a combination of the 
Miller Mental Ability Test, and the Dartmouth Completion of 
Definitions Test. Тһе first test contains 120 items and the 
second 40, so that the maximum score was 160. Тһе scores of 
the 450 students in this sample ranged from 50 to 150, the dis- 
tribution being fairly normal. Asa measure of interest and ap- 
plication it was decided to take the average number of hours 
per week spent in study. Information with regard to study 
habits was obtained by means of a questionnaire given at the 
- beginning and again at the middle of the first semester. Among 
other items in the questionnaire upon which information was 
requested were the number of hours spent per week at meals, in 
sleeping, etc. These and other questions were included in order 
that the student might think that he was being checked upon the 
distribution of his total time and not upon his study habits alone. 
The correlation between the student's estimates of the number 
of hours spent in study (given on the first and second question- 
naires) was .86, indicating a satisfactory degree of reliability. 

As stated above, the main object of this study was to find how 


* May, M. A., “Predicting Academic Success,” Journal of Educational 
Psychology, 14 (1923), 429-440, 


[| раа 


| w 


PARTIAL AND MULTIPLE CORRELATION 409 


accurately the number of honor points which a student earns 
can be predicted from a knowledge of his study habits and his 
general intelligence. Other factors, of course, such as health, per- 
sonality, previous preparation, and the like, are undoubtedly of 
importance in determining the number of honor points received. 
The two factors selected were chosen because they are important 
апа are also objective and measurable. Аз the first step in 
Solving our problem, we shall calculate the partial coefficient 
Which shows to what extent honor points are related to general 
intelligence when the variable factor of study hours per week 
is held constant. Next the partial coefficient will be calculated 
which shows to what extent honor points are related to study 
hours when the variable effect of general intelligence is rendered 
constant. Apart from the employment of these partial coeffi- 
cients in the regression equation from which we predict honor 
points, the information which they yield will prove in itself to 
be of considerable interest. The solution of the problem is out- 
lined in the following series of steps; the necessary data and 


calculations will be found in Table 59. 


Step 1 

The mean and o of each series of measures and the inter- 
Correlations are first calculated. ‘These intercorrelations are 
product-moment 7’s computed as shown in Chapter IX. The 
correlation between (1) honor points and (2) general intelligence, 
Written ть, is .60; the correlation between (1) honor points and: 
(3) the number of hours per week spent on the average in study, 
Written rs, is 32; and the correlation between (2) general in- 
telligence and (3) hours of study per week, written 7%), is = 35. 
The low correlation between honor points and study hours is of 
decided interest; but tae most surprising correlation is the 
— .35 between study hours and general intelligence. Evidently 


the brighter the student, the less he studies. 

Step 2 

Having found the intercorrelations of our three variables, we 
may then calculate the net correlation between (1) honor points 


410 STATISTICS IN PSYCHOLOGY AND EDUCATION 


and (2) general intelligence with the influence of (3) study hours 
partialled out or held constant. This net or partial coefficient 
of correlation, written 715.5, is found from the following formula: 


Ti» — 713723 


Tjgg = ——— RET ag 
me” MIC a УГ 


(87), p. 415 


Substitution of the values for ri», 713, and rs in the formula gives 
a partial coefficient, r123, of .80. This means that if all of our 
450 students had studied exactly the same number of hours per 
week, the coefficient of correlation between honor points earned 
and general intelligence test scores would have been .80 instead 
of .60. In other words, if each student spends the same number 
of hours in study, there is a closer correspondence between 
general intelligence test scores and honor points earned than 
there is when the number of study hours varies. 

The partial coefficient of correlation between (1) honor points 
and (3) hours spent in Study per week with (2) 
ligence partialled out, or its influence held c 
from the formula 


general intel- 
onstant, is found 


Tis — 70793 


> 0 (87) 
Vl-r,Vl— T 


Substitution of the values for 713, 
efficient, 7132, of -71, as against an obtained coefficient (713) of 
:32. This result means that if our group possessed the same 
general intelligence* there would be а much closer correspond- 
ence between the number of honor points received and the num- 
ber of hours spent in study than there is when the members of 


the group possess varying degrees of intelligence. This is cer- 
tainly the result to be expected. 


The last partial coefficient of corre] 
This coefficient gives the net e 
intelligence and (3) study hours 
points is held constant. 


71,2 = 


Ти, and 7s gives a partial co- 


ation 7з equals — .72. 
orrelation between (2) general 
when the influence of (1) honor 
It is found from the formula 


* By "same general intelligence" is meant the same score on the given 
general intelligence tests, 


PARTIAL AND MULTIPLE CORRELATION 411 


з — Taris T 
тал = ———— Á———— (87) 
“1-тр» М1 - 73 


Like the two partial rs above, we may interpret 7:5; to mean 
that the correlation between general intelligence and hours spent 
in study in a group in which every student earns the same 
number of honor points would be much higher (in the inverse 
direction) than the “raw” correlation between the same two 
factors in an unselected group. By an unselected group is meant 
here a group in which the number of honor points received by 
different students varies. It seems evident that the brighter 
Student not only studies less than the average and dull (since 
Ta = — .35) but that the brighter the student, the less he needs 
to study in order to reach a given standard of academic success 
— сага a given number of honor points. 


Step 3 

Knowing the partial coefficients of correlation, we may write 
the multiple regression equation from which the most probable 
number of honor points a student will receive may be estimated 
when we know his score in the general intelligence test and the 
number of hours he studies per week. The regression equation 
for three variables (in deviation form) is as follows: 


ту = biest: + 613.223 (89), p. 419 


In this equation а stands for honor points and is the dependent 
variable or criterion; 2% and zs stand for general intelligence 
and study hours, respectively, and are the independent variables. 
Note the resemblance of this equation to the simple regression 
equation for two variables j = bis X = (p. 312). If x, is put for 
J, and as for z in the two-variable equation, we have zi = by X 2s. 

When written in score form, the multiple regression equation 
for three variables becomes 

(Xi — М) = воз (Хз — М») + bias(Xs — Ms) 


ог transposing and collecting terms, 
Xi = bissXs + 6з2Хз + К (a constant) (90), р. 419 


412 STATISTICS IN PSYCHOLOGY AND EDUCATION 


It is clear that before we can use this equation we must find 
the value of the partial regression coefficients bis. and bis. 
These may be found from the formulas 


01.2. 22 
Боз = ina = * and bie = risa aus (93), p. 420 


2.13 


and, as we already have the values of туз and тіз, it is only 
necessary that we find 0123, C233, and оз.» (the partial 075) in 
order to replace the partial regression coefficients in the equation 
by numerical values. 

Note that the partial coefficient of correlation тоз 1, although of 
interest as giving us the relation between general intelligence 
and hours spent in study for a constant number of honor points 
earned, is not actually needed in the regression equation 
Tı = bizt + Өз әт. In order to evaluate the constants 515,5 and 
[132 in our regression equation, we need only rs; and тәл. In 
fact, in any problem involving three variables, only (шо partial 
coefficients of correlation need be computed, if we are interested 
primarily in the prediction of X, scores from known values of 
X: and Ху. 

Step 4. 
The partial 075 may be found from the formulas 


0153 = OV l — т V 1 — ry 9 
9243 = 02.1 = 02V 1 — r5 М1 — p, (88), p. H7 

0342 = бал = 03 V 1 — 1°» М1] — 723.2 
Substituting the known values of the raw and partial 7's in these 


formulas we find that бі = 6.3; 0345 = 8.9: and 0342 = 4.0. 
(For the calculations see Table 59). : i 


Step 5 
From the partial 07 and the partial r’s the numerical values 


of the partial regression coefficients 0, з and bis.2 are found to be 


.57 and 1.12, respectively. We may now write the multiple re- 
gression equation in deviation form as т 


Tı = D 725 + 1.122, 


PARTIAL AND MULTIPLE CORRELATION 413 


In order to write this multiple regression equation in score 
form we replace zi by (Xi — 18.5); zs by (Xs — 100.6); and zs 
by (X43 — 24). Тһе equation then becomes 

Ху 7X» + 1.12Х, — 66 


Given a student's general intelligence test score (X3) and the 
number of hours per week he spends in study (Хз), we can esti- 
mate from this equation the “most probable” number of honor 
points he will receive during his first semester in college. Sup- 
Pose that student J. N. has a general intelligence test score of 
120 and that he studies on the average twenty hours per week: 
how many honor points will he most probably receive during the 
first semester? Substituting Х = 120 and Хз = 20 in the re- 
gression equation, we find that 

Y, =(.57 х 120) + (1.12 x 20) — 66 = 25 
The most probable number of honor points which student J. N. 
will receive, therefore, using the given measures as the basis of 


our forecast, is twenty-five. 

Step 6 

Я: и n қ 
This forecast, like every other “most probable" number of 


honor points predieted from the regression equation, has an 
"error of estimate.” "The standard error of estimate of any X, 
Predicted from the regression equation, Xi = bisXs + 01:33 
+ К ds written Tiest. xp, and equals 1.23 direetly (p. 418). 

The standard error of estimate іп the present problem is 6.3, 
and in the illustration given above, the twenty-five honor points 
estimated for J, Х. have a SE eest.xy of about six points. This 
Means that the chances are about two in three that our forecast 
of twenty-five honor points will not miss the actual number of 
honor points received by J. N. by more than + 6. In general 
We may say that two-thirds of all predicted honor point values 
Will lie within 4- 6 points of their actual values. 

Шер? 
The final step in the solution of our three-variable correlation 
Problem is the computation of the coefficient of multiple cor- 


414 STATISTICS IN PSYCHOLOGY AND EDUCATION 


relation. “Мире r,” generally written HR, is defined (see 
p. 425) as the coefficient of correlation between scores actually 
made on the criterion test and scores on the same test predicted 
from the regression equation. In the present problem, № gives 
the correlation between earned honor points (Ху) and honer 
points estimated by means of the two variables, general intelii 

gence (X) and hours of study (Хз), when these two are combined 
into a team by means of the regression equation. The formula 
for R when we are dealing with three variables is 


Каз = \/1- Eo (97), p. 124 
9^ 

In the present problem езу = .83. This means that if the most 
probable number of honor points which each student in out 
group of 450 will receive is predicted from the regression equa- 
tion given on page 413, the correlation between these 450 
predicted scores and the 450 scores actually received will be .83. 
Multiple R tells us to what extent X; is determined by the com- 
bined action of X» and Хз; or, in the present instance, to what 
extent honor points are related to general intelligence and 
number of study hours per week taken together, 


ПІ. GENERAL FORMULAS ror USE ім PARTIAL AND 
MULTIPLE CORRELATION 


1. Partial 775 of Any Order 
(1) Formulas for Partial 778 


We found in Table 59 that one is able by the method of partial 
correlation to find the net relationship between two variables 
when the influence of a third is ruled out or held constant. By 
an extension of the partial correlation method, we may obtain the 
net correlation between X; and X; when two or more variables 
have been held constant. Тһе partial coefficient of correla- 
tion т.м, for example, means by analogy to тз з that the corre- 
lation between X; and X; has been freed of the influence of both 


X; and Ха; and the partial coefficient of correlation ri.s...” 


PARTIAL AND MULTIPLE CORRELATION 415 


means that the correlation between X; and Хз has been freed 
of the influence of a large number of disturbing factors. 

In every partial coefficient of correlation, e.g, пез, the 
primary subscripts to the lcft of the point (1 and 2) define the 
two variables whose net correlation we are seeking. The second- 
ary subscripts to the right of the point (3 and 4) denote the 
variables ruled out or held constant. The order in which the 
secondary subseripts are written is immaterial, i.e., 712.31 = 71243. 
The order of the primary subscripts is of importance, however, 
asit tells us which variable is taken to be dependent and which 
independent. Thus r means that Xi is dependent — is to be 
predicted from X»; while rs means that Xs is dependent — is 
to be predicted from Хі. The numerical values 712 and ra are, 
of course, the same. "Тһе order of a partial r is determined by 
the number of its secondary subscripts. Thus 7i, an “entire” 
or “total” r, is a coefficient of zero order; nia is а partial r of 
the first order; rss is a coefficient of the third order. 

The general formula for a partial 7 is 


"19.4 (п) Tin ee 2. (I7) = 
Wisi gel Т... > (л) п. 5 M. ach (87) 
VI = rusa... а) V 1- "о.м... (л-р) 
(partial correlation coefficient in terms of the coefficients 
of lower order —n variables) 


s of any given order may be found. 
(n — 1) = 4, and n = 5, 


From this formula partial 7” 
In a five-variable problem, for example, 
80 that rj», is written 

712.31 — Тів.47 95.51 


712.345 = s E 
M1- risa V1 — rusa 


that is, in terms of the partial 7/5 of the second order. These 
second order partial 7’s must then be computed by formula (87) 
from r's of the first order before the third order 7, т.з, can be 
evaluated. In calculating partial r’s Table 60 may be used to 
read МТ — 7? values. 

There are several method 
are useful in certain special problems. 


s akin to partial correlation which 
Two of these, part cor- 


416 STATISTICS IN PSYCHOLOGY AND EDUCATION 


TABLE 60 


М ЕРТЕСІ 
A TABLE то Inrer THE VALUE оғ УІ — 7? FROM А 
GIVEN VALUE OF r 


r vi- т r vl-r r 
1.0000 -3400 .6800 4332 

e -9999 ЕЯ 69 7238 
“02 .9998 .36 “70 
703 9995 37 71 
.04 .9992 .38 E 
.05 .9987 139 73 
706 .9982 40 71 
.07 .9975 ен .75 
108 19968 492 76 
`09 АЗ : 77 
10 44 .8980 78 
En -45 .8930 .79 
12 46 .80 
13 47 81 
14 48 82 
15 .49 753 
че 50 En 
17 .51 `85 
18 52 ‘86 
19 .53 87 
20 154 “58 
21 .55 39 
.22 .56 790 
23 .57 ‘OL 
A .58 702 
25 59 .93 
.26 .60 94 
27 .61 “95 
28 62 79% 
20 -63 97 
.30 .64 “98 
.31 165 Eo 
pe 66 100 
193 67 


relation and. semi-partial correlation, may be mentioned briefly. 


"These procedures differ from partial correlation in that they give 
the net effect secured by ruling out the influence of one or more 
variables from only one of the two correlated measures, instead 
of from both. For example, one may wish to know the relation 
(semi-partial) between reaction time and speed of reading when 
differences in size of vocabulary are held constant with respect 
to reading only. Part correlation and semi-partial correlation 


W 


PARTIAL AND MULTIPLE CORRELATION 417 


have not been widely used in mental measurement. For a dis- 
cussion of formulas and for illustrations see references below.* 


(2) Significance of a Partial r 

The significance of a partial r (like that of a zero-order 7) may 
be tested against the null hypothesis. We may use either Table 
49, page 299 or Table 61, column headed “2 variables." The 
degrees of freedom for a partial r are № — m where № = number 
of cases, and m = number of variables entering into the partial т. 
"Thus if rp; = 40 and N = 75, т = 5 and № —m— 75 — 5 
or 70. 

In Table 59, тез = .80, N = 450, т = 3, and N — m = 447. 
From Table 61, column 2, the r entries by interpolation for 
N = 447 are .093 and .121 at the .05 and .01 levels. The prob- 
ability that the obtained ris; of .80 arose from fluctuations of 
sampling is much less than .01; and this is true, also, of ris.» of 
Т1 and ps, of — .72. АП three partial 775, in fact, are highly 


Significant. 
2. Partial o’s of Any Order 


General Formulas 
Just as the correlation between two sets of scores can be de- 


termined when the influence of 1, 2, 3 . . . n factors is held 


constant, so the variability (е) of a set of scores сап be computed 
when the influence of 1,2,8... variables is ruled out. As 
an illustration, consider 1.23 of Table 59. This partial o gives 
the variability of X, (honor points) freed of the influence upon 
Variability erated by the two factors Xs (general intelligence) 
and X, (study hours per week). The general formula for partial 


зы 3 
TS of any order is 


Ан... = guVl = rg М1 - rues MT = тиз 
VIA Pings... (ny (88) 


(partial с for n variables) 
* Ezeki 3 relation Analysis (2nd ed., 1941), p. 213. 
anap j. W, ping Л E. E., “On the Analysis of Causation, 
Journal of Educational Psychology, 21 (1930), 657-680. 


418 STATISTICS IN PSYCHOLOGY AND EDUCATION 


This formula may be used to compute the net o’s in correlation 
problems which involve any number of variables. In a five- 
variable problem, for example, 01.2345 is written 


Trams = УП — r^i УІ — ria УТ — Виз М1 = rs an 


This partial с is of the fourth order since it has four secondary 
subscripts, and the order of a partial c, like the order of a par- 
tial т, is determined by the number of its secondary subscripts. 
Ву a simple rearrangement of the secondary subscripts, any 
higher order с may be written in more than one way. A partial 
с of the second order may be written in two ways: for example, 
01.3 Which is given on page 412 as 
01233 = 01V 1 — ri V 1— rs 
may also be written 


Сіло = 03 Vl— Pu V1 — r5 


In like manner 62,5 may be written 


(1) Фзлз = OVI т V1 — ri 
or 

(2) ба = OVI — Pa УТ инь 
апа 754» may be written 

(1) 03.2 = 03 Vl — rg М Tra 
ог 

(2) Оз = TV1 т, МІ таз 


These alternate forms of a partial с are useful as a. check 
upon arithmetic caleulations; also they make unnecessary the 
calculation of unused partial »'s. Use of the second forms of 
02.13 and бал» instead of the first (see Table 59), for example, 
makes it unnecessary to compute 723.1 so far as the partial 68 
in the regression equation are concerned. Furthermore, if 723.1 
is not wanted for other purposes, it need not be calculated at 
all (see p. 412). Two partial 7’s are all that are required in order 
to write the regression equation of а three-variable problem. 


| 


PARTIAL AND MULTIPLE CORRELATION 419 


3. Multiple Regression Equations and Partial Regression Co- 
efficients 
(1) The Multiple Regression Equation for Any Number of 
Variables 
"The regression equation which expresses the relationship be- 
tween a single dependent or criterion variable, Ху, and any 
number of independent variables, X», Хз, Xi . . . Xn may be 
Written in deviation form as follows: 


(89) 


Ti = Dress... наз + Discs... ts ro E danas... ops 


(regression equation, deviation form, for n variables) 


апа in score form 


X, = озм... aXe + бизн... Хз tt binas coo Xs К 


(90) 


(regression equation, score form, for n variables) 


The partial regression coefficients лом... ny ә... n etes give 
the weights to be attached to the scores of each independent 
variable when X; is to be estimated from all of these in combina- 
tion. Furthermore, the regression coefficients give the weight 
which each variable exerts in determining X; when the influ- 
ence of the other variables is excluded. Hence, we can tell from 
the regression equation just what role each of the several test 
Variables plays in determining the score on Test 1, the test taken 
as the criterion. 
(2) The Multiple Regression Equation for Three Variables 
(Special Form) 
When a problem involves only three variables, the regression 


equation, as we have seen, is written 


Tı = bsta + dis.ets (deviation form) 


If the partial r’s and partial 075 are of no special interest, it is 
Possible to express the equation above in a somewhat more 
Convenient form for calculation, as follows: 


420 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Clr - ТізГез) gi(ri — T1223) А 91 
o2(1 — r^s) det G3(l — 1%) id вы 


T 
(regression equation for three variables, special form) 


or in score form 


= gi(ris — Tires) ү giri — 71273) 
^ o2(1 — 1°23) Rart- сз(1 — 193) 


(regression equation for three variables, special form) 


L+K (99) 


As this equation involves only zero order r's and zero order 
o’s, X; may be estimated from it without the computation of 
any partial 7's or partial 075. We may illustrate using the data 
given in Table 59, page 407. Substituting for ту = 11.2, с = 
15.8, оз = 6, ris = .60, та = .32, and 7 = — .35, we have 


71 


11.2(.60 4- .32 x .35) қ» 11.2(.32 + .60 x .35) _ 
15.8 (1 — 359) 2% 6 — .35° 4% 
Tı = 57x. + 1.1223 


which checks the regression equation as calculated in Table 59. 
(3) Partial Regression Coefficients (1/8) 

Partial regression coefficients may be computed from the 
formula 


0.23: 
bis... = Tin... а 205" (93) 


: 
болм..." 


(partial regression coefficients in terms of partial coefficients of 
correlation and standard errors of estimate — n variables) 


When the problem involves three variables, the regression coef- 
ficients, Әз and biz. are, like тз and Тау of the first order- 


The [first regression coefficient, bis, equals тз = and the 
2.18 
second regression coefficient, bis », equals 73.5 01.23. 


3.12 
Partial regression coefficients which involve more than three 


variables may be calculated from formula (93). In a five-vari- 


|| 
1 


PARTIAL AND MULTIPLE CORRELATION 421 


able problem, for example, the regression coefficients (of the 
third order) are 


Б” 01.2315 
зз = те 35 — —— 
02.1345 
01.2315 

6.3.215 = 73.245 ete. 
3.1215 


In order to find these partial regression coefficients we first com- 
pute the third order partial 775, and the fourth order partial 075. 

The b's are determined by the o’s of the tests and these in 
turn depend upon the units in terms of which the test is scored. 
The b-coefficients give the weights of scores in the independent 
variables, X», X, ete., but not the contribution of these variables 
Without regard to the scoring system employed. Тһе latter 
contribution is given by the “beta weights," described in (4) 


below. 


(4) Тһе Beta (8) Coefficients 
When expressed in terms of standard or o-scores, partial 
regression coefficients are usually called beta coefficients. Тһе 
beta coefficients may be calculated directly from the b’s as fol- 
lows: 
т» е 
Bien... n = аи... п a (94) 
(beta, coefficients calculated from partial regression coefficients) 


The multiple regression equation for n variables may also be 


Written in standard scores as 


21 = Bist... а: + бал... 
(multiple regression equation ?n term 


Beta coefficients are often called “beta weights” to distinguish 
them from the “score weights” (b’s) of the ordinary; multiple 
regression equation. When all of our tests have been ex- 
Pressed in standard scores (all Means = .00 and all о’ = 1.00) 
differences in test units as well as differences in variability are 
allowed for. We are then able to determine from the correla- 


nest Біл... (nen (95) 
s of standard scores) 


492 STATISTICS IN PSYCHOLOGY AND EDUCATION 


tions alone the relative weight with which each independent 
variable “enters іп” or contributes to the criterion, independ- 
ently of the other factors. m А 

To illustrate with the data in Table 59, we find that fis 
6.0 


or .81 and that Виз. = 1.12 х iig 9 .60. From 


5.5 


11.2 
(95) above we get 


=.57 X 


21 = .8125 + .60z3 


This equation should be compared with the multiple regression 
equation т! = .572s + 1.1253 in Table 59 which gives the weights 
to be attached to the scores in X» and Хз. Тһе weights of .57 
and 1.12 tell us the amount by which scores in X; and Хз must 
be multiplied in order to give the “best” prediction of Xi. 
But these weights do not give us the relative importance of 
general intelligence and study habits in determining the number 
of honor points a freshman will receive. This information is 
given by the beta weights. It is of interest to note that while 
the actual score weights are as 1:2 (.57 to 1.12), the independent 
contributions of general intelligence (ә) and study habits (zs) 
are in the ratio of .81 to .60 or as 4:3. When the variabilities 
(о?) of our tests are all equal and scoring units are comparable, 
‚ general intelligence has а proportionately greater influence than 


study habits in determining academic achievement. This is 
certainly the result to be expected. 


4. The Standard Error of Estimate for 


Multiple Regression 
Equations 


All X; scores estimated from a mul 


tiple regression equation 
have a standard error of estimate у 


hich measures the error 
made in taking scores given by the regression equation instead 
of actual scores (those earned on the criterion test). The stand- 
ard error of estimate is given directly by Fiss...» as follows 


Fost. Хр = O12234... n (96) 
(standard error of estimate for n variables) 


PARTIAL AND MULTIPLE CORRELATION 423 


Since том...» must be computed in order to evaluate the 
partial regression coefficients (p. 421), без. хо is always cal- 
culated in the course of the problem. In Table 59, the Gust. ху 
of a prediction of honor points is 6.3. The chances are about 
seven in ten or two in three, that the “most probable" honor 
point score forecast for any student will be in error by six points 
or less, 

It is worth while examining further into the meaning of 
Cos ху. This standard error of estimate equals тз; and the 
latter indicates the effect upon the variability of Test 1 (honor 
points) obtained by eliminating (or holding constant) the in- 
fluence of ‘Tests 2 and 3 (general intelligence and study effort). 
The smaller түз is with respect to бі, the greater the influence 
exerted by our two factors upon Test 15 variability. In Table 
59 it is clear that in ruling out the variability in Test 1 attribut- 
able to Tests 2 and 3, we reduce c: from 11.2 to 6.3 (91.13) or by 
nearly one-half. This means that students alike in general in- 
telligence and in study habits differ much less in scholastic 
achievement than do students in general. > : 

From the multiple regression equation Ха = 5TA + 112 Хз 
— 66 (see p. 413), Xi (honor points) can be predicted with а 
Smaller error of estimate than from апу other linear equation. 
Put differently, the standard error of estimate is а minimum 
When the regression equation is used to estimate Xi scores. * 
Hence, the values of X; predicted from the multiple regression 
equation are the “best estimates" of the actual X; values which 
can be made from a linear equation containing the given vari- 


ables. 


5. The Coefficient of Multiple Correlation, R 
(1) General Formulas 
The correlation between a single dependent or criterion 
variable X; and (n — 1) independent variables combined by 
Means of a multiple regression equation is given by the formula 
8t * Yule, С. U., and Kendall, M. C., An Introduction to the Theory of 
tistics (12 ed., 1940), рр. 262-267. 


424 STATISTICS IN PSYCHOLOGY AND EDUCATION 


Ri... n = "m (97) 
(multiple correlation coefficient in terms of partial 
g's — n variables) 
in which 
Роз... х) = the coefficient of multiple correlation 
о — the standard deviation of the criterion. (X1) 


Scores 
01.05... = the variability left in Test 1 when the vari- 
ability of Tests 2, 3...n is held constant 
through partial correlation. 
When there are only three variables, the multiple coefficient 
of correlation becomes 
1+ ШЕ 


2 


9 


В. озу = 


when there are five variables 


ЖЕ 
Raus = V 1 – 8 
тү 


If we replace 71535... п Ш formula (97) by its value in terms 
of the entire and partial r’s [see formula (88)] we may write 
the general formula for Rigs... n) as follows: 


Ем...) = УТ CA 25) ть)... (L= тыз... mv) d 
(98) 
(multiple coefficient of correlation in terms of partial coefficients 
of correlation — n variables) 


Since a higher order с may be written in a variety of ways, the 
number depending upon its order (see p. 417), there are several 
alternate forms for R. "These serve as valuable means of check- 
ing the accuracy of our arithmetical calculations. In a three- 
variable problem, for example, (оз) may be written as 


Ries) = V1- (= 75) — 75,2) | 
or as 


Rig = V1 — Та — 3) — Pea) 


PARTIAL AND MULTIPLE CORRELATION 425' 


Тһе standard error of estimate is a minimum when the mul- 
tiple regression equation is employed in estimating X, scores 
(p.423). Hence the multiple coefficient of correlation, А, is the 
maximum correlation obtainable between actual X; scores and 
X; scores estimated from a knowledge of the variables Хз, Хз 
-.. X, in the regression equation. The truth of this state- 
ment is contingent upon linearity of regression in all of the cor- 
relations. R indicates how accurately a given combination of 
variables represents the actual values of X, (the criterion) when 
Our test scores are combined in accordance with the “best” 
linear equation. 


(2) Multiple R in Terms of 8 Coefficients 
R? may be expressed in terms of the bet 
Zero order 7’s: 


a coefficients and the 


Rares., һу = Вам. . ate Ёш... пз +- -- 8.3... o) Tin 
(99) 


(multiple В? in terms of 8 coefficients and zero order r's) 


For three variables (99) becomes 
Raes) = Brane + 21.2713 


From page 422 we find Bis = .81 and 633.2 = 60; апа from 
Table 59 that ry = .60 and тв = 32. Substituting in (99) 


above we get 
| Raen = 81 X -60+ .60 x .32 


= 49 4- .19 
Res) = -68 
Bien = -83 


Р, оз... „) gives the proportion of the variance of the criterion 
Measure (X;) attributable to the joint action of the variables 
Xs Xs... X, As shown above, Res) = .68; and, accord- 
Шу, 68% of whatever makes freshmen differ in (1) school 
Achievement, can be attributed to differences in (2) general in- 
telligence, and (3) study habits. By means of formula (99) the 
total contribution of .68 can be broken down further into the 


496 STATISTICS IN PSYCHOLOGY AND EDUCATION 


independent contributions of general intelligence (Хз) and study 
habits (Хз). Thus from the equation 12104) = 49+ .19, we 
know that 49% is the contribution of general intelligence to the 
variance of honor points, and 19% is the contribution of study 
habits. The remaining 32% of the variance of X, must be at- 
tributed to factors not measured in our problem. 


(3) The Significance of R 
Multiple R is positive,* always less than 1.00, and always 
greater than the correlation coefficients т», туз, . . . та. Тһе 


significance of an R can best be tested, perhaps, against the null 
hypothesis by means of Table 61. This table must be entered 


TABLE 61 


COEFFICIENTS OF CORRELATION SIGNIFICANT AT THE 5% LEVEL 
AND AT THE 1% LEVEL ror VARYING DEGREES or FREEDOM 


Degrees Number of Variables 
о 
Freedom| 2 3 4 5 6 7 9 
1 .997 .999 -999 -999 | 1.000 | 1.000 | 1.000 


1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 


2 .950 975 .983 .987 990 | .992 994 
990 | .995 | .997 | .998 | ‘998 | ‘998 | .999 


3 878 .930 .950 961 .968 .973 :979 
.959 .976 .983 .987 .990 .991 .993 


4 811 .881 912 .930 .942 .950 .961 
.917 | .949 | 1962 -970 | .976 | .979 | .984 


5 754 .836 874 .808 914 .925 .941 
1874 917 1937 949 ‘957 | 1963 .971 


6 707 | .795 839 | .867 .886 | .900 | .920 
834 | .886 | .911 | .927 | :938 | 946 | .957 


7 666 | .758 | .807 -838 | .860 | .876 | .900 
798 | 866 | ‘885 | 1904 | эз | 1928 | 1942 
8 632 | 0726 | 0777 | 811 .835 | .854 | .880 
765 | .827 | .860 | .882 898 | 1909 | 1926 


9 .602 | .697 | .750 786 812 | .832 | -861 
135 | .800 | .836 | .861 | .878 | 801) .911 


* Since R is always positive, chance errors are cumulative and may Ре 
considerable if the sample is small and the number of variables large. For 
the correction of R for chance errors, see Formula 100, page 451. 


PARTIAL AND MULTIPLE CORRELATION 


TABLE 61—Continued 


427 


Берга Number of Variables 
Freedom 2 3 4 5 6 7 9 
10 576 671 .726 1768 | 2190 | .812 | .843 
108 | i776 | .814 | .840 | .859 | .874 | .896 
11 553 | .648 | 2708 | 141 | .770 | 792 | .826 
1684 | 763 | 193 | .821 | .841 | .857 | .880 
12 .532 .627 .683 .722 751 774 .309 
661 | (732 | .773 | 1802 | .824 | .841 | .866 
13 514 .608 .664 .703 .733 757 794 
1641 | 112 | (155 | 185 .807 | .825 | .852 
14 497 | .590 | .616 | .686 | .717 | ти | .779 
623 | 1694 | 1737 | .768 | .792 | .810 | .838 
15 A82 574 .630 .670 701 .726 .765 
1606 | 67 | 121 | 752 | 2116 | 196 | .825 
16 468 559 .615 :655 .686 | .712 | .751 
590 | 662 | .706 | .738 | .762 | .782 | .818 
17 456 | .545 | .001 | .641 | .673 | .698 | .738 
575 | 1647 | .691 | .724 | -749 | 2169 | .800 
18 444 532 | .587 | .628 | .600 | -686 | .726 
561 | 1633 | .678 | .710 | 2136 | -756 | 189 
19 .433 520 | .575 Gis | 647 | .674 | .714 
‘549 | 620 | 1665 | .698 | .723 | -744 | .T78 
20 423 | 509 | .563 | .604 | .636 | .062 | .703 
3587 | ‘608 | 662 | -685 | -712 | 83 | .767 
21 413 | 498 | .552 | 0.502 | 624 | .651 | -693 
526 | 1696 | .641 | (674 | -700 | 4122 | -156 
22 404 A88 | .542 „582 | .614 | .640 | .682 
515 | 885 | .630 | 663 | .690 | .712 | -746 
?3 зов | 479 | .532 | .572 | -604 | .630 | .673 
"bob | ‘574 | 619 | .602 | -679 | (108 | .T86 
24 .388 | 470 523 | 0562 | .594 | .021 | .663 
1496 565 | 609 | .642 | .669 | .692 | .727 
25 .381 462 | .514 | -553 | 0585 | -612 | .654 
вт | ‘555 | .600 | .633 | .660 | .682 | .718 
26 374 | 454 | .506 | .545 | -576 | -603 | .645 
Ja | mae | 590 | i624 | 2661 | .673 | 109 
27 .367 446 | 408 | .530 | .568 | -594 | .637 
"470 | 1538 | 15682 | -615 | -642 | .66& | .701 
28 | 361 | 430 | .490 | .529 | -560 | .586 | .629 
463 | ‘580 | 513 | .606 | -634 | .606 | .692 


498 STATISTICS IN PSYCHOLOGY ANAD EDUCATION 
TABLE 61—Continued 
Degrees Number of Variables 
of 
Freedom 2 3 4 5 6 T В 
29 355 .432 482 552 579 621 
456 .522 .565 .625 .648 :685 
30 1349 426 ATO .545 571 614 
-449 .514 „558 .618 .640 677 
35 395 397 445 1519 1538 .580 
418 .481 .523 .582 :605 .642 
40 .304 .373 419 484 1509 551 
1393 454 494 .552 -576 .612 
45 .288 .353 .397 460 A85 526 
372 .430 ATO .527 .549 -586 
50 278 .336 -379 440 464 .504 
-354 410 `449 .504 1526 .562 
60 250 308 348 406 .429 A67 
.325 .377 414 “466 .488 .523 
70 232 .286 .324 .379 401 .438 
302 | .361 | .386 436 | 456| .491 
80 217 269 .304 356 37 413 
283 | .330 | 1362 “ait “434 464 
90 .205 254 288 315 338 358 392 
267 | .312 | .343 | .368 | “390 | 1409 | 241 
100 195 | .241 | .274 | .300 322 341 374 
264 | .997 .327 .351 .372 390 421 
125 174 216 246 269 290 
E 4 26 .307 .338 
228 266 294 1316 1335 .352 .381 
150 .159 198 225 247 266 2 
Р .22: 24 2 282 .310 
208 | .244 | 7270 ‘200 | (308 .324 | .351 
200 .138 179 196 215 2: j 
1% 21; .231 24 271 
481 | .212 | (234 253 269 283 .807 
300 113 141 160 176 
, i : 190 | .202 | .223 
448 | 174 | 19 208 | 1221 .233 -253 
400 :098 122 139 153 5 5 
: 13 ВТ 16: 176 .194 
128 | 151) .167 | 180 | 1192 | 1202 | 1220 
500 .088 .109 124 137 1 7 е 
қ i 1148 5 A74 
лав | 1135 | ‘160 | 162 | 172 | ‘ago | өв 
1000 062 1077 1088 097 105 124 
081 | 109% 106 115 | 1192 428 141 


_ -] SSS М. 


PARTIAL AND MULTIPLE CORRELATION 429 


with N — m degrees of freedom, and with the number of vari- 
ables (m) in the problem. To illustrate with Table 59, R = .83, 
N = 450, m = Запа N — т = 450 — 3or447. From the column 
headed “3” in Table 61 we read that for 447 degrees of freedom 
the Æ’s at the .05 and .01 levels (by interpolation) are .116 and 
.143. Only once in twenty trials would an R of .116 arise by 
sampling fluctuations on the null hypothesis, and only once in 
100 trials would an R of .143 occur. Ав our R is very much 
larger than .14, it is highly significant. Table 61 may be used 
With problems involving up to nine variables. Suppose that 
Riess) = .526 and № = 40. From the column headed “5 vari- 
ables” in Table 61, we find that for 40 — 5 or 35 degrees of free- 
dom, the R’s are 482 and .556 at the .05 and .01 levels. The 
obtained R is significant, therefore, at the .05, but not at the 


:01, level. а 


IV. 8ргногв CORRELATION 
tween two sets of test scores is said to be 
art, at least, to factors other 
ance in the tests themselves. 


The correlation be 
spurious when it is due in some p 
than those which determine perform 
In general, the cause of spurious correlation lies in a failure to 
Control conditions; and the most usual effect of this lack of 
control is a “boosting” or inflation of the coefficient. Some of 
the situations which may lead to spurious correlation ml GE 


Blven in this section. 


1. Spurious Correlation Arising from Heterogeneity 

_ We have shown elsewhere (p. 404) how а lack of uniformity 
in age conditions will lead to correlations which are spuriously 
high. Failure to take account of heterogeneity introduced by 
the age factor is а prolific source of error in correlational work. 
To cite an example, within a group of boys ten to eighteen 
Years old, a substantial correlation will appear between strength 
9f grip and memory span, quite apart from any intrinsic re- 
lationship, due solely to the fact that both variables increase 
With age. In stating the correlation between two tests, or the 


Г AND EDUCATION 
430 STATISTICS IN PSYCHOLOGY AND EDUCATI( 


А st, one should always be careful to 
reliability TN qui тик and other data bear- 
Sy Ss уак all. it and cultural differences, in order to 
We ipon а бї heterogeneity in the group. Without. this 
ж T p ^im т may be of little value, 

M o dle ity is introduced by other factors than age. If 

72. and bad heredity аге all positively re- 
22 por м between alcoholism and degeneracy will be too high 
on vase of the effect of heredity upon both factors) unless 
н can be “held constant." Again, assume that we have 
measured two distinctly different 

and 500 day laborers, upon а cancel] 
intelligence test. The mean abilit 
nitely higher in the college group. 
between the two tests is zero with 
rately, if the two Broups are combin, 
appear because of the heterogencit 
to age, intelligence, and educational 
relation is, of Course, spurious,* 

To be а valid meas а correlation coefficient 
сез which affect the ге- 
"ned. This may be ac- 
" Broups in which age (or 
is constant; or (2) one 
Г р ` to be controlled can be 
measured and its сорго, i A 
calculated. 


2. Spurious Index Correlation T 


Even when three variables y. 1, X», and Х 3 
correlation between the indices 21 and 2, 


* Garrett, H. E., and Anastasi, А., The Tetrad- 
the M. casurement of М ental Ттайв, Annals N. ew Y, 
No. 33 (1932), 233-282. 


Yule, С. U., An I. ntroduction to the 


are uncorrelated, a 
pus 
(where Z, — X 1/ Xa, 


Difference Criterion and 
ork Academy of Sciences, 


Theory of Statistics (1932), pp. 215- 
"Thomson, G. H., and Pintner, R., “Spurious Correlation and Relation- 
ship Between Tests,” Journal of Educational Psychology, 15 (1924), 433- 
444, 


\ 


Q 


PARTIAL AND MULTIPLE CORRELATION 431 


and Z: = X», Хз) may appear which is as large as .50. To illus- 
trate, if two individuals observe a series of magnitudes (e.g., 
Galton bar settings) independently, the absolute errors of ob- 
servation (NX, and Xs) may be uncorrelated, and still an appre- 
ciable correlation appear between the errors made by the two 
observers, when these are expressed as percents of the observed 
magnitudes (Хз). The spurious element here, of course, is the 
common factor Хз in the denominator of the ratios. 

One of the commonest examples of a spurious index relation- 
ship in psychology is found in the correlation of 1.0.25 or E.Q.'s 
obtained from intelligence and achievement tests. If the 1.Q.’s 
of 500 children ranging in age from three to fourteen years are 
calculated from two tests X and Хо, the correlation is between 


NEAL, М.А. If С.А. were a constant (the same for all 


апа ВА” 

children) it would have no effect on the correlation and we would 
simply be correlating M.A.’s. But when С.А. varies from child 
to child there is usually a correlation between C.A. and M.A. 


which tends to increase the r between I.Q.'s — sometimes con- 
siderably. 


3. Spurious Correlation between Averages 
Spurious correlation usually results when the average scores 


made by a number of different groups on à given test are cor- 
related against the average scores made by the same groups on 
а second test, Ап example is furnished by the correlations re- 
Ported by Bagley* between the mean Army Alpha scores, by 
States, and such “educational” factors as number of schools, 

oks sold, magazines circulated in the states, ete. Most of 
these correlations are high — many above .90. If average cor- 
relations by states are compared with the correlations between 
Intelligence scores and number of years spent in school within 
these latter 5 are usually much lower. 


the з 
epnarate states : 
: 2 "inflated" because а 


AN H 
Correlations between averages become е 
large number of factors which ordinarily reduce the correlation 


* Bagley, W- С. Determinism in Education (1925), p. 81. 


432 STATISTICS IN PSYCHOLOGY AND EDUCATION 


within a single group cancel out when üverages are taken from 
group to group. Average intelligence test scores, for instance, 
increase regularly as we 50 up the occupational scale from day 
laborer to the professions; but the correlation between intelli- 
gence and status (training, salary, etc.) at a given occupational 
level is far from perfect. 


PROBLEMS 


1. The correlation between a general intellig 
achievement in a group of children from ci 
old is .80. The correlation between the g 
and age in the same group is .70; 
School achievement and age is .60. W 
general intelligence and school achiey 
age? Comment upon your result, 


2. In a group of 100 college freshmen, the correlation between а) 
Army Alpha and (2) the A-cancellation test is .20. Тһе correlation 
between (1) Army Alpha and (3) a battery of controlled association 
tests in the same group is .70. If the correlation between (2) сап- 
cellation and (3) controlled association is 45, what is the “net” 

"ancellation in this group? 
Between Alpha and controlled association? Interpret your results. 
3. Explain Why some variables are of Such a nature that it is difficult 


to hold them “constant,” and hence to employ them in problems 
involving partial correlation. 


4. Given the following 


ence test and school 
ght to fourteen years 
eneral intelligence test 
and the correlation between 
hat is the correlation between 
ement in children of the same 


data for fifty-six children: 


X= Stanford-Binet I.Q. 
Х» = Метогу for Objects 
X: — Cube Imitation 

М, = 101.71 


М, = 10.06 М; = 3.35 
7; = 13.65 9» = 3.06 оз = 2.02 
тв = 41 Таз 50) тз = .16 


(а) Work out the Tegression equation of 
the method of Section II. 
(b) Compute Розу and C (est, ху. 


(c) If a child's score is 12 in Test X» and 4 in Test Ху, what is his 
most probable score in y, (1.9)? 


Ха and X, upon Ху, using 


n2 


XR 


PARTIAL AND MULTIPLE CORRELATION 433° 


‚ Let X; be a criterion and Хз and X; be two other tests. Correla- 
tions and 075 are as follows: 


т = .60 сі = 5.00 
туз = .50 c» = 10.00 
T» = .20 os = 8.00 


How much more accurately can X, be predicted from X» and Хз 


than from either alone? 

- Given a team of two tests, each of which correlates .50 with a cri- 

terion. If the two tests correlate .20 

(а) How much would the addition of another test which correlates 
.50 with the criterion and .20 with each of the other tests im- 


prove the predictive value of the team? 
(b How much would the addition of two such tests improve the 
predictive value of the team? 
: Two absolutely independent tests B and C completely determine 
the criterion A. If B correlates .50 with A, what is the correlation 
of C and А? What is the multiple correlation of А with B and C? 


‚ Comment upon the following statements: 

(а) It is good practice to correlate E.Q.'s achieved upon two edu- 
cational achievement tests, no matter how wide the age range. 

(0) The positive correlation between average Army Alpha scores by 
states and the average elevation of the states above sea level 
proves the close relationship of intelligence and geography. 

(c) Тһе correlation between memory test scores and tapping rate 
in a group of 200 eight-year-old children is .20; and the correla- 
lion between memory test scores and tapping rate in а group 
of 100 college freshmen is .10. When the two groups are com- 
bined the correlation between these two tests becomes .40. 
This shows that we must have large groups in order to get high 
correlations. 

ANSWERS 
Toss m 
* 7 (Alpha and cancellation) = 
ciation) = .70 
“(а) X, = 147Х, + 2.98 Хз + 76.95 
(5) Еш = .60; Ciest. x) = 10.93 
(c) 106.50 or 107 


— 119; 7 (Alpha and controlled а з50- 


434 


N 


- From X; alone, без. ху = 4.0 


From X; alone, Tiest. ху = 4.3 
From X; and Хз, бен, X) = 3.5 


- (a) R inereases from .64 to .73 


(b) В increases from .64 to 79 


. Tac = .87; Rawo = 1.00 


STATISTICS IN PSYCHOLOGY AND EDUC 


ATION 


CHAPTER XIV 
MULTIPLE CORRELATION IN TEST SELECTION 


І Tue WHERRY-DOOLITTLE Тезт SELECTION Метнор* 


Tur method of solving multiple correlation problems outlined 
in Section II and Table 59 of Chapter XIII is adequate enough 
When there are only three (or not more than four) variables. 
In problems involving more than four variables, however, the 
mechanics of calculation become almost prohibitive unless some 
Systematic scheme of solution is adopted. Тһе Wherry-Doo- 
little Test Selection Method, to be presented in this section, 
Provides a method of solving multiple correlation problems 
with а minimum of statistical labor. This method selects the 
tests of the battery analytically and adds them one at a time 
until a maximum R is obtained. To illustrate, suppose we wish 
to predict aptitude for a certain technical job in a factory. 
Criterion ratings for job proficiency have been obtained and 
eight tests tried out as possible indicators of job aptitude. By 
use of the Wherry-Doolittle method we can (1) select those 
tests (e.g., three or four) which yield a maximum А with the 
criterion and discard the rest; (2) calculate the multiple В 


After the addition of each test, stopping the process when Е no 
(3) compute а multiple regression equation 


nger increases; 
ases; 
dicted with the highest pre- 


from which the criterion can be pre 
Cision of which the given list of tests is capable. | 

Тһе application of the Wherry-Doolittle test selection method 
to an actual problem is shown in Example (1) below. Steps in 
Computation are outlined in order and are illustrated by refer- 
ence to the data of Example (1), so that the reader may follow 

€ process in detail. 

* Stead, W. HL, Shartle, C. Lu, et al., Occupational Counseling Techniques 


(1940), Appendix 5 
485 


LLL 


= 


436 STATISTICS IN PSYCHOLOGY AND EDUCATION 


1. Solution оға Multiple Correlation Problem by the Wherry- " 
Doolittle Test Selection Method 


Example (1) In Table 62 are presented the intercorrela- 
tions of ten tests administered in the Minnesota. study at 
Mechanical Ability. Тһе criterion — called the “quality 
criterion — was a measure of the excellence of mechanical 
work done by 100 junior high-school boys. The tests in Table 
62 are fairly representative of the wide range of measures 
used in the Minnesota study. Our immediate problem is to 
choose from among these variables the most ya 


lid battery of 
tests, i.e., those tests which will prediet the c 


riterion most 


TABLE 62 
IwrERCORRELATIONS OF TEN Tests 


AND A CRITERION 
(Data from the Minnesota Study of M 


List of Tests (М = 100) 
Quality criterion 
Packing blocks 
Card sorting 
Minnesota s atial relations boards, А,В, сур 
Paper form b 


ards, А and B 
I 


©» ёл & с 
Www gg 


Tinnesota assembly boxes, А, В, С 

8 = Mechanical operations questionnaire 

= Interest analysis blank 
= Otis intelligence test 


1 2 8 4 5 6 7 8 10 
C 26 19 -53 52 ол 81 55 .30 .26 
1 52 134 714 18 21 530 .00 -00 
2 23 14 С 924 18 _ тә .08 
3 68 49 0% 56 22 -23 
4 37 .30 49 24 56 
5 54 46 24 11 
6 40 19 (21 
7 40 13 
8 18 
9 88 


Steps in the solution of Example (1) may be outlined in order. 
* Paterson, D, С. Elliott, В. 


" ott, R. M., et al, Minnesota Mechanical Ability 
Tests (1930), Appendix 4, 


MULTIPLE CORRELATION IN TEST SELECTION 437 


Step 1 
Draw up work sheets like those of Tables 63 and 64. Тһе 
correlation coefficients between tests and criterion are entered 


in Table 62. 
Step 2 


қ Enter these coefficients with signs reversed in the Vi row of 
Table 63.* "Тһе numbers heading the columns refer to the tests. 


TABLE 63 
Tests 
1 2 3 4 5 6 7 8 9 10 
310 -.550 -.300 -.550 —.260 


—.190 
—.080 —.324 —.188 


-.118 
-.049 - —.061 
-.094 -.056 
-.099 -.018 
(—.550)*. (.057)*. .051)? 
1.000 ’ 480” 
= 8025 10066 
Step 3 
Enter the numbers 1.000 in each column of the row Z; in 
Table 64. 
TABLE 64 
Tests 
1 2 3 4 5 6 7 8 9 10 
21 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 
‘910 983 686 760 788 810 510 832 .983 
‘853 1945 563  .559 02786 .839 .831 854 
$^ 2859 931 7180 748 .782 829 “852 
8.7906 .927 І 437 715 .829 .637 
1 
— = 1.202 
832 
1 
=. = 1.776 
563 
1 т 
—— = 2.04 
ng em 
* Correlation coefficients are assumed to be accurate to three or to 
]eulatious to avoid the loss of precision which 


f à y 
our decimals in subsequent cà. 


results when decimals are rounded to two places. (See p. 495.) 


438 STATISTICS IN PSYCHOLOGY AND EDUCATION 
Step 4 


79 


р Р ү? ; | T 
Select that test having the highest 4 quotient as the first 


test of the battery. From Tables 63 and 64 we find that Tests 
7 and 9 both have correlations of .550 with the criterion, and 
that these are the largest r’s in the table. Either Test 7 or 
Test 9 could be selected as the first test of our battery. We have 
chosen Test 7 because it is the more objective measure of per- 
formance. 


Step 5 ” 
Apply the Wherry shrinkage formula 


Ю8-і-ю(/У-і 
Г = 1 к(Ү-4) 


in which Ё is the “shrunken” 
the coefficient from which ch 


This corrected R may be cal 
follows: 


multiple correlation coefficient, 

* 
апсе error has been removed. 
culated in a systematic way as 


(1) Prepare a work sheet similar to that shown in Table 65. 


TABLE 65 

a b с а е f [4 

Vint " М-і = қ = Test 

we K = К? R? R # 
0 1.000 (N — 100) 
1 3075 6975 1.000 :6975 .3025 5500 7 
2  .1961 5714 1.010 5771 — 4229 16508 9 
3  .0107 .5547 1.021 5663 — .4337 6586 3 
4 10066 .5481 1.031 .5651 — 4349 “6505 4 
5 00031 .5450 1.042 -5679 — 4321 6573 8 


(2) Enter 1.000 in column e, row 0, under K?. Enter N — 100 
in column d. 


* Wherry, R. J., A New Formula for Predict 


i 5 ing the Shrinkage of the Co; 
efficient of Multiple Correlation, Annals of Mathematical Statistics, Vol. 
(1931), 440-451. 


BS — .-- 


(3 


(4 


Step 6 


= 


— 


= 


MULTIPLE CORRELATION IN TEST SELECTION 439 


Enter the quotient Ls in eolumn b, row 1. т 
1 1 
(850)? aios 
img ~ ^ 


Subtract .3025 from 1.000 to give .6975 as the entry in 
column e under А?. 


" Р |—1 
Find the quotient = and record it in column d. 


(N — 1) = 99; and since m (number of tests selected) is 1, 


N- 
(№ — m) also equals 99 and (N =1) = 1.0000. 
(N — m) 


Write the product of columns e and d in column e: 
6975 x 1.000 = .6975. 

Subtraet the column e entry from 1.0000 to obtain n 
(the shrunken multiple correlation coefficient) in column 
f. In Table 65 the 2 entry, of course, is .3025. 

Find the square root of the column f entry and enter the 
result in column g under R. Our entry is .5500, the cor- 
relation of Test 7 with the criterion. № correction for 


chance errors is necessary for one test. 


То aid in the selection of a second test to be added to our 


battery of one, a work sheet simil 
should be prepared. 


(1) Leave a; row blank. 
(2) Enter in row bı the correl 


(3 


=> 


ar to that shown in Table 66 
Calculations in Table 66 are ав follows: 


ations of Test 7 (first selected 


test) with each of the other tests in Table 62. These r’s 


are .300, .130, .560, еїс., and are entered in the columns 
numbered to correspond to the tests. Enter 1.000 in 
the column for Test 7. In column — C enter the correla- 
tion of Test 7 with the criterion with sign reversed, i.e., 
as — .550. 

Write the algebraic sum of the bı entries in the “Check 


Sum” column. This sum is 3.730. 
* Quotient is taken to four decimals (p. 455). 


STATISTICS IN PSYCHOLOGY AND EDUCATION 


440 


295:1- LIT 9899'— | I£0'— | 6ІГ- 6РГЬ-— 0001- 980 265 9 
CTL 180`— Фе C10" S960 9/0 68 %0- SFr % 
т 0968  0O0c€— 09% 010 OFS 06: 008 04g 0001 029 oro 0ғ т 
2905- ОЛ 80 — 990: | 8ІС- 660- 8eeg&—  000I— 80Г- 90Г- 9 
I9UTT 260- 180° 280- 627 OFT’ 66r £96' 680 sso 


$ ОРТ 069 — 0606 oss’ 066 096” 06€ Oc 069 00071 086 OFE °D 


S0r6c— 68E oe — 0001- Wr- | IFO 670’— cóP— Sirm EIC — I9$— 7 
1€0'6 $768 — 206 GES 980° 780`— TPO 607" 026” LL LIS “q 


б 0595 059 — 086 0007 oeg OTF OST 066” 019 066: 085` OFE "D 
082'8— 056 r= ОТР ОР ОТ г OF k= Ш Г 006 % 
L 065% 099 — OST Оо” OOF 00071 OOF OOF 06T' 09€" OET’ 006” d 


2 ung 


i 


= 9 
жәр wu 9 ОТ б 8 2 8 ? 8 б I 


99 W'IHV.L 


MULTIPLE CORRELATION IN TEST SELECTION 441 


(4) Multiply each b; entry by the negative reciprocal of the by 
entry for Test 7, the first selected test. Enter these 
products in the сі row. Since the negative reciprocal of 
Test 7's b; entry is — 1.000, we need simply write the b; 
entries in the c; row with signs reversed. 


Step 7 
Draw a vertical line under Test 7 in Table 63 to show that it 
has been selected. To select a second test proceed as follows: 
(1) To each Vi entry in Table 63, add algebraically the 
product of the b; entry in the criterion (— C) column of 
Table 66 by the cı entry for each of the other tests. 
Enter results in the V2 row. The formula for V2 is Ve 
= V, + bı (criterion) X сі (each test). To illustrate, from 
Table 66 and Table 63 we have 
For Test 1: Va = — -260 + (— .550) X (— .300) = 
— 260+ .165 = — .095 
For Test 4: Ve = — .520 + (— .550) x (- .490) = 
— .520 4- .270 = — .250 
For Test 9: Va = — -550 + (- .550) х (- .410) = 
= 850-4 996 = — .924 


(2) To each Z; in Table 64 add algebraically the product of 
the b, and c; entries for each test got from Table 66. 
Enter these results in the 22 row. The formula is 
Z, = Z, +b (a given test) Ха (same test). То illus- 


trate, from Tables 63 and 66 

For Test 1: 2 = 1.000 + (.300) X (— 1300) = 1.000 — .090 

— .910 

For Test 4: Ж, = 1.000 + (.490) X (— .490) = 1.000 — .240 
= .760 


For Test 9: Z, = 1.000 -- (410) X (— 410) = 1.000 — .168 
= .832 


Step 8 yi 
Now select the test having the largest z quotient, as the 


442 STATISTICS IN PSYCHOLOGY AND EDUCATION 
second test for our battery. Тһе quantity T is a measure of 


the amount which the second test contributes to the squared 
multiple correlation coefficient, R?. From Tables 63 and 64 
we find that Test 9 has the largest я quotient: шс = .1261. 
Step 9 
То caleulate the new multiple correlation coefficient when 
Test 9 is added to Test 7, proceed as follows: 
А A VAN. Е 
(1) The quantity .1261 (2) 18 entered in column b, row 2 
of Table 65. i 


7,2 
(2) Subtract the ratio z from the К? 


= 


entry in column c, 


row 1, and enter the result in column e, row 2; e.g., for 


the entry in column €, row 2, we have .6975 — .1261, or 
.5714. 


т А — 1 s 
Find the quotient, су = 5. Since N = 100 and m (num- 


ber of tests chosen) = 2, we have wv — Il ad = 1.010, 


{3 


== 


=m) 97 9g 
аз the column d, row 2 entry. 


Record the product of the c and d columns in column e: 
5714 X 1.010 = 5771, 
(5) Subtract -5771 (column e) from 1.0000 to give .4229 as 
the entry in column f, row 2. 

Take the square root of .4229 and enter the result, .6503, 
in column g. This is the multiple coefficient Ẹ corrected 
for chance errors, It is clear that by adding Test 9 to 
Test 7 we increase В from -5500 to .6503, a substantial 
gain. 
Step 10 

Since В for Tests 7 and 9 is larger than the correlation for 
Test 7 alone, we proceed to add a third test in the hope of fur- 


sher increasing the multiple В. The Procedure is shown in 
Step 11. 


(4 


— 


8 


= 


MULTIPLE CORRELATION IN TEST SELECTION 443 


Step 11 


Return to Table 66 and 
(1) Record in the a» row the correlation coefficient of the 


(5 


= 


second selected test (i.e., Test 9) with each of the other 
tests and with the criterion. (Read r’s from Table 62.) 
Тһе correlation of Test 9 with the criterion is entered 
with sign reversed (1.е., as — .550). 

Enter the algebraic sum of the аз entries (i.e., 3.580) in 
the Check Sum column. 

Draw a vertical line down through the b» and в» rows for 
Test 7, the first selected test. This indicates that Test 7 
has already been chosen. 

Compute the 0» entry for each test by adding to the a» 
entry the product of the bı entry of the given test by the 
c entry of the second selected test (ї.е., Test 9). The 
formula is b» = a» + bı (given test) X cı (second selected 


test). To illustrate: 
.230 + (.130)(- .410) = .230 — .053 = 


For Test 2: be 
1177 

For Test 6: by = .130 + (.400)(— 410) = .130 — .164 = 
- .084 

For Test 10: b= .380 + (.130) (— .410) = .380 — .053 = 
.327 


Compute 0» entries for criterion and Check Sum column 
in the same way. For the criterion column we have 
— .550 + (— .550)(— .410) or .324. For the Check Sum 
column we have 3.580 + (3.730) (— .410) or 2.051. 

There are three checks for the б» row. (а) The entry for 
the second selected test (Test 9) should equal the 2 entry 
for the same test in Table 64. Note that both entries 
are .832. (b) The entry in the criterion column should 
equal the V2 entry of the second selected test (Test 9) in 
Table 63; both entries are — .324. (с) The entry in the 
Check Sum column should equal the sum of all of the 


444 STATISTICS IN PSYCHOLOGY AND EDUCATION 


entries in the б» row. Adding .217, .177, .320, etc., we get 
2.051, checking our calculations to the third decimal. 
Multiply each b» entry by the negative reciprocal of the 
b; entry for the second selected test (Test 9), and record 
results in the сг row. The negative reciprocal of .832 is 
— 1.202. Тһе c» entry for Test 1 is .217 x — 1.202 or 
- .261; for Test 2, — .177 X — 1.202 or — .213; and so 
on for the other tests. For the criterion column the ce 
entry is (— .324) X — 1.202 or .389; and for the Check 
Sum the c» entry is 2.051 X — 1.202 or — 2.465. 

There are three checks for the c» entries. (a) 'The c» row 
entry of the second selected test (Test 9) should be 
— 1.000. (0) Тһе c» entry in the Check Sum column 
should equal the sum of all сә entries. Adding the c» entries 
in Table 66, we find the sum to be — 2.465, the Check 
Bum entry. (c) The product of the b; and c; entries in the 


(6 


< 


(7 


— 


criterion column should equal the quotient Ls in column 
b, row 2 of Table 65 in absolute value. Note that the 
product (— .324 x .389) = — .1261, thus checking our 
entry (disregard signs). 
Step 12 
Draw a vertical line under Test 9 in Table 63, to indicate 
that it has been selected as our second test. Then proceed as in 
Step 7 to compute V3 and Zs in order to select a third test. 
'The formula for V; is Vs = Vo+ bo (criterion) Х c» (each test). 
The formula for Z; is Zs = Z; + be (a given test) X c» (same test). 


The third selected test is that one which has the largest “ 
quotient in Table 63. "This is Test 3, for which V; = — .222 + 
(—.324)(-.385) or —.097; and Zs = .686 + (.320)(— .385) 


2 


3 


= .563. The quotient ve = .0167. 
3 
Step 13 
7 2 
Entering .0167 (2) in column b, row 3, of Table 65, follow 
3 


MULTIPLE CORRELATION IN TEST SELECTION 445 

the procedure of Step 9 to get 2 = .6586. Note that 4-7 

-т 

- 99/97 or 1.021; and that the new R is larger than the .6503 

found for the two tests, 7 and 9. We include Test 3 in our 

battery, therefore, and proceed to calculate аз, bs and б; 
(Table 66), following Step 11, in order to select a fourth test. 


Step 14 
The a; entries in Table 66 are the correlations of Test 3 with 
each of the other tests including the criterion. The criterion 
correlation is entered in the — C column with a negative sign 
G.e., as — .530). 
(1) The formula for b; is bs 
selected test) + b» (give 
То illustrate, 


For Test 1: з= 


= as + bı (given test) X а (third 
n test) X с (third selected test). 


340 + (.300) (— .560) + (.217) (— .385) 


For Test 4: 0з 


= .19 


38 

.630 4- (.490) (— .560) + (.409) (— .385) 
9 
Check the bs entries by Step 11 (8). (a) Note that the bs 
entry for the third selected test (Test 3) equals the Zs 
entry for Test 3 in Table 64, namely, .563. (0) The 
entry in the criterion column equals the Үз entry of the 


third selected test (Test 3) in Table 68, ie, — .097. 
(c) The Check Sum entry (1.161) equals the sum of the 


entries in the бз row. 
Тһе formula for cs is bs 
b; entry for the third selecte 
tive reciprocal of .563 is — 1.7 
culation for Test 5, сз = 146 X 
the c; entries by Step 11 (7). 
third selected test (Test 3) equals — 1.000. (b) The cs 
entry in the Check Sum column, namely, — 2.062, equals 
the sum of the cs row. (c) The product of the bs and сз 


x the negative reciprocal of the 
d test (Test 3). 'The nega- 
76. To illustrate the cal- 
— 1.776 = — .259. Check 
(a) The c; row entry of the 


(2 


=> 


446 STATISTICS IN PSYCHOLOGY AND EDUCATION 


entries in the criterion column (namely, — .097 x .172) 
" Уз\ . — : 
equals the quotient (=) (i.e., .0167) in absolute value. 


Step 15 © 

Repeat Step 12 to find V, and Z,. The formula for V4 is 
V4 = Уз 0з (criterion) X сз (each test). Also, the formula 
for 24 is Zs + bs (a given test) X сз (same test). For Test 4, 
У. =- .091 + (— .097)(- .353) or — .057; and Z,- 559 + 
(.199)(— .353) or .489. Тһе quotient, Les equals See 


.489 
ог .0066. While none of the У; entries is large, Test 4 has the 


7,2 
largest А quotient, and hence is selected as our fourth test. 
4 
Enter .0066 (Ж) in column b, row 4, of Table 65. Follow M 
А € —1 
procedure of Step 9 to get 2 = .6595. Note that E ci 


(N — m) 
is 99/96 or 1.031; and that the new Ris but slightly larger than 
the Е of .6586 found for the three tests, 7, 9, and 3. When № 
decreases or fails to increase, there is no point in adding new 
tests to the battery. The increase in В 


in А is so small as a result of 
adding Test 4 that it is hardly profitable to enlarge our battery 
by a fifth test. 


We shall add a fifth test, however, in order to 
illustrate a further step in the selection process, 


Step 16 


To choose a fifth test, calculate as, bs, and сч, following Step 11, 
and enter the results in Table 66. The a, entries are the correla- 
tions of the fourth selected test (Test 4) with each of the other 
tests including the criterion (with sign reversed). 


(1) The formula for b; may readily be written by analogy to 
the formulas for b; and be as follows: bs = ag + br (given 
test) X cı (fourth selected test) +b: (given test) X @ 


(fourth selected test) + bs (given test) X cs (fourth selected 
test). To illustrate 


a 


р 
` 


"2 


MULTIPLE CORRELATION IN TEST SELECTION 447 


For Test 6: b; = .300 + (400)(— .490) + (— .034) 
(— 492) + (.179) (— .333) = .058 

For Test 10: by = .560 + (.130) (— .490) + (.337) (— .492) 
+ (.031)(— .353) = .324 


Check the b, entries by Step 11 (5). (а) The 0; entry for 
the fourth selected test (Test 4) equals the Z, entry for 
Test 4 in Table 64, namely, 489. (b) The entry in the 
criterion column equals the V; entry of the fourth selected 
test (Test 4), i.e, — .057. (c) The Check Sum (.715) 
equals the sum of the entries in the b, row. 

То find the entries с, multiply each bi by the negative 
reciprocal of the b, entry for the fourth selected test 
(Test 4). The negative reciprocal of .489 is — 2.045. То 
illustrate, 

For Test 1: а = — -145 X — 2.045 = .297. 


Check the c; entries by Step 11 (7). (a) The c; row entry 
of the fourth selected test (Test 4) equals — 1.000. (b) The 
c, entry in the Check Sum column, namely, — 1.462, 
equals the sum of the с; row. (c) The product of the b, 
and c, entries in the criterion column (namely, — .057 X 


ye (i.e. .0066) in absolute 
Zs 


(2 


= 


.117) equals the quotient 
value. 


Step 17 

Repeat Step 12 to find Vs and Zs. 
Xe (each test); and Zs = 2 а 
test). Test 8 has the largest GE) quotient (ie., .0031) and 
this number is entered in column b, row 5 of Table 65. Follow- 
ing Step 9, we get 2 = .0573. This multiple correlation coeffi- 
cient is smaller than the preceding R. We need go no further, 
therefore, as we have reached the point of diminishing returns 
and the addition of a sixth test will not increase the multiple R. 
It may be noted that four (really three) tests constitute a bat- 


y, = Vit bi (criterion) 
(a given test) X с: (same 


448 STATISTICS IN PSYCHOLOGY AND EDUCATION 


tery which has the highest validity of any combination of tests 
chosen from our list of ten. The multiple R between the cri- 
terion and all ten tests would be somewhat lower — when сог- 
rected for chance error — than the R we have found for our 
battery of four tests. Тһе Wherry-Doolittle method not. only 
selects the most economical battery but saves а large amount 
of statistical work. 


2. Calculation of the Multiple Regression Equation for Tests 
Selected by the Wherry-Doolittle Method 
Steps involved in setting up a пи 


Utiple regression equation 
for the tests selected in Table 63 m 


ау be set down as follows: 


TABLE 67 


7 9 3 4 -C 
Cı — 1.000 — 410 — 560 — 490  .550 
с. - 1.000 - .385 - 492  .389 
C; — 1.000 —.353 .172 
C, = 1.000  .117 


Step 1 


Draw up a work sheet, like that shown in Table 67. Enter 
the C entries for the four selected tests (namely, 7, 9, 3, and 4) 
and for the criterion, following the order in which the tests 
were selected for the battery. When equated to zero, each row 
in Table 67 is an equation defining the beta weights. 

For our four tests, the equations are 


- 1.0008; — .410 в, — +5608 — 4908, + .550 = 0 
— 1.0008, — .385 6; — 4928, + .389 = 0 
— 1.0008, — .353 в, + 172 = 0 


— 1.0008, + .117 = 0 
Step 2 


Solve the fourth equation to find 8, = NA 
Step 3 
Substitute for 8; = .117 in the third equation to get 8; = .131- 


MULTIPLE CORRELATION IN TEST SELECTION 449 


Step 4 

Substitute for 8; and 6: in the second equation to get В = 
.280. Finally, substitute for 83, 8, and & in the first equation 
to get 8; = .305. 


Step 5 
The regression equation for predicting the criterion from the 
four selected tests (7, 9, 3, and 4) may be written in standard 
score form by means of formula (95), page 421, as follows: 
Ze = Bitz + Bota Bs + Ba 


in which f; = Вазы; В = Вали; Bs = база; Bs Dass. 


Substituting for the 278 we have 

Ze = .305z; + .2802 + 13123 + .117г4. 
To predict the criterion score of any subject in our group, sub- 
stitute his scores in Tests 7, 9, 3, and 4 (expressed as o-scores) 


in this equation. 


Step 6 


To write the regression equation ir : 
be transformed into b’s by means of formula (94), p 


follows: 
: бер. р m 22 By; bi = Ва. 
br = T Br; m Boi bs = o, P5 bs 0% Bs 


a score form the |78 must 
age 421, as 


о: of Test 7, съ of Test 


бе 
In general, bp = E» В» 


The c's are the SD's of the test scores: 
9, с, of the criterion, etc. 
бер? 


Тһе regression equation in score for 
X, = ых, + boXo + Хз + К 


m may now be written 
*(90) p. 419 


and the Gest.x, = TeV 1 — Pe 7931) (58) p. 320 
“о 
* This equation is not written for our four tests because means and 
SD's are no* given in Table 62. 


450 STATISTICS IN PSYCHOLOGY AND EDUCATION 
3. Checking the В Weights and Multiple R 


Step 1 jl 

Тһе В weights may be checked by formula (99), page 425, in 
which R is expressed in terms of beta coefficients. In the present 
example, we have 


Г? тз) = Biter + Bore + Bares + Bira 


in which c equals the criterion and the "е are the correlations 
between the criterion (c) and the Tests, 7, 9, 3, and 4. Sub- 


stituting for the r's and £’s (computed in the last section) we 
have 


ЕЙ, зи) = .305 X .550 + .280 x .550 + .131Х .530 + .117Х .520 
-1678 + .1540 + .0694 + .0608 = .4520 
Re (тоз) = .6723 


From R*.os1) we know that our battery accounts for 45% of the 
variance of the criterion. Also (p. 426) our four tests (7, 9, 3, 
and 4) contribute 17%, 15%, 7%, and 6%, respectively, to the 
variance of the criterion, 
Step 2 


The R? of .4520 calculated above should equal (1 — К?) when 
K? is taken from column c, row 4 in Table 65. From Table 65 
we find that 1— K? = 1 — 5481 or 4519 which checks the R? 
found above — and hence the 8 weights — very closely. 

Step 3 


It will be noted that 
6723 found above is 50 
-6595 found between the criterion 


; especially when N is small or the number 
of test, variables large. 


be “adjusted” in order 


MULTIPLE CORRELATION IN TEST SELECTION 451 


lation in the population.* The relationship of the R, corrected 
for chance errors, to the R as usually calculated, is given by the 


following equation: 


т». (М- 1)8%- (т- 1) 
e (N — m) (10071 


(relation of В, to В, corrected for chance errors) 


Substituting .4520 for 22, 99 for (N — 1), 96 for (N — m) and 
3 for (m — 1), we have from (100) that 
5 99 x .4520 — 3 
Bie, TUUS л 
R 96 4349 


and 
R = .6595 (see Table 65) 


The X of .6595 is the corrected multiple correlation between our 
criterion and test battery, or the multiple correlation coefficient 
estimated for the population from which our sample was drawn. 
In the present problem, shrinkage in multiple А is quite small 
(.0723 — .6595 = .0128) as the sample is fairly large and there 
Эге only four tests in the multiple regression equation. 


IL APPLICATIONS оғ PARTIAL AND MULTIPLE 


CORRELATION 


1. Partial Correlation in Analysis 

Partial correlation may be of decided value as an aid in 
analyzing the part played by each of several factors in deter- 
Mining a total result. An illustration may be cited from the 
Work of Cyril Burt.{ Burt wished to find to what extent a 
child’s M.A., as measured by the Binet test, influences his 
school attainment. His subjects were 300 children, seven to 
fourteen years old. For each child (1) an M.A. was deter- 
Mined, (2) his scholastic achievement as measured by edu- 


* Ezekiel, M., Methods of Correlation Analysis (1941), pp. 823-324. 


Wherry, op. cit., p. 451. 
t Burt, Cyr, Menal and Scholastic Tests (London: 1921), рр. 180-184. 


452 STATISTICS IN PSYCHOLOGY AND EDUCATION 


cational examinations and checked by teachers, and (3) his 
chronological age. Тһе correlation between Binet М.А. and 
scholastic achievement (rz) was .91. When chronological age 
(3) was partialled out the correlation (r.3) between Binet M.A. 
and scholastic achievement dropped to .68. This result shows, 
in the first place, that chronological age has a decided effect 
upon the correlation between M.A. and school work; it tends to 
increase or “dilate” the obtained r. This dilation is brought 
about by the fact that both M.A. and school attainment increase 
with C.A., and this common dependence on chronological age 
serves to boost the observed correlation. "The residual partial 
correlation (ri) of .68 indicates, however, a substantial rela- 
tionship remaining between M.A. and school work when age is à 
constant factor. In other words, Binet M.A. is a substantial 
factor in a pupil's school attainment at each age level from seven. 
to fourteen. | Taking the analysis а Step further, Burt found that 
the correlation between (2) school work and (3) chronological 
age (723) жаз 87; and that when Binet М.А. was held constant, 
the рена r (723.1) between school work and C.A. was reduced 
to .49. This persistence of a substantial relationship between 
school work and C.A., when variability arising from differences 
in M.A. is eliminated, offers confirmatory evidence according to 
of the “undue influence of age upon school classification.” 
оа analyses made through the elimination of factors by 
p correlation, “causal”? relationships may often be de- 
tng uat, ua ly of enue e 
employees over a peri count of illness among ае рен 
: er a period of a year, found that the observed cor- 
relation between absence and mean temperature on the day of 
арван was — .37. When the four factors (1) relative humidity 
at 8:00 А.м.; (2) relative humidity at noon of the previous day; 
(3) inches of rainfall on the day of absence; and (4) percent 0 
possible sunshine on the day of absence wate held constant, the 
partial correlation remaining between absence and temperature 


* Phillips, F. E., Applicati i i 
Public Health Report, ериш Na 7 in Correlation to a Health Problem: 


"ц > 


MULTIPLE CORRELATION IN TEST SELECTION 458 


was — .39. Since the partial correlation between absence and 
temperature was the only r not reduced by the elimination of 
other factors the conclusion seems to be that of the factors 
studied, temperature on the day of absence is the most important 
contributing cause of absence. Illness, of course, must be taken 
as the primary cause of absence. It must be clearly understood 
that partial correlation has nothing to say about causal rela- 
tions. One cannot say which of two variables is the cause and 
which the effect, when all one has is the correlation between 
them. Sometimes, however, cause and effect distinctions are a 
matter of common-sense analysis. In the illustration given 
above, for instance, the distinction between cause and effect is 
clear. 

Another example of the use of partia 
investigation is found in the work of К 
undertook to ferret out the causes of attendance and non- 
Certain factors (1) distance from 
school, (2) age-grade relationship, (3) kind of work done by 
pupils, (4) training and experience of the teacher, (5) school 
equipment, and (6) character of the community were selected 
as presumably having some effect. upon school attendance. 
When partial correlation coefficients were calculated it was 
found that the original correlations between attendance and 
distance from school, and between attendance and character of 
the community, were the least reduced. The first coefficient 
was lowered from — 45 to — 43; and the second from .30 to 
28. Of all of the factors selected, therefore, these two seemed 
to have the most direct and independent influence upon school 
attendance. As in the problem of temperature and absence, 
Cited above, the distinction between cause and effect is clear: 
it is evident that distance from school and character of com- 
munity are the causes and not the effects of good or poor school 


attendance. 


1 correlation in a “causal” 
eavis.* This investigator 


attendance in rural schools. 


à " " 

* Reavi: tors Controlling Attendance in Rural Schools, 
Teach ети University, Contributions to Education, No. 108 
(1920), 52-69. 


454 STATISTICS IN PSYCHOLOGY AND EDUCATION 
2. Multiple Correlation in Analysis 


Multiple correlation is often useful when one wishes i 
termine the influence of a number of test variables, taken Бар У 
and together, upon the criterion variable being studied. len, 
as shown in Section I, multiple correlation enables us to select 
from a number of tests the most valid battery for forecasting а 
criterion of worker performance.* A few illustrations of ыды 
applieation of multiple correlation to psychological problems 
will be cited here; the student will encounter many in the litera- 
ture. In a group of fifty-seven fourth-grade children, the 7 
between educational achievement and М.А. was .595. When 
physical efficiency (vigor, stamina, ete.) as estimated by 
teachers was added to М.А., the 72 of educational achievement 
with M.A. plus physical efficieney was .653, a gain of about .06 
point. However, when emotional maturity (as estimated by 
teachers) was added to the battery М.А. plus physical efficiency, 
and still further social maturity (as estimated by teachers) was 
added to M.A. plus physical efficieney plus emotional maturity, 
the multiple correlation was unchanged. Gates concludes: 
“Physical fitness, then, appears to exert a greater specific in- 
fluence (1.е., over and above the r with М.А.) upon achievement 
than does either social or emotional maturity or both combined. 
Both combined add practically nothing of value to a team of 
M.A. plus physical fitness for purposes of predicting scholasti¢ 
achievement." 

Burkst has made use of multiple correlation in determining 
the relative contribution of heredity and environment to a child's 
1.0. as measured by Stanford-Binet. Тһе 7? between I.Q. and 
parental intelligence test Score plus environmental index (by 
Whittier Home Scale) was found to be .61 for an N of 105. Since 

* Stead and Shartle, op. cit., Chapters 5-9 inclusive. 


f Gates, A. I., “The Nature and Educational Significance of Physical 
Status and of Mental, Physi 


е lysiologieal, Social and Emotional Maturity,’ 
Journal of Educational Psychology, 15 (1924), 347-349. 
t Burks, B. S., The Relative Influence 


Mental Development; a Comparative Stud 
Resemblance and True Parent-True Chil 
N.S.S.E. (1928), Part I, 219-316. 


of Nature and Nurture upod 
y of Foster Parent-Foster Chi я 
d Resemblance, 27th Yearbook, 


$ 


MULTIPLE CORRELATION IN TEST SELECTION 455 


R? is 37, about 37% of the variance of children’s intelligence 
may be attributed to the combined effect of home environment 
and parents’ mental level. Parental intelligence contributed 
33%, and home environment 4%, to the 37% accounted for by 
these two factors. The remaining 63% is attributable to factors 
not measured by these two. 


3. Limitations to the Use of Partial and Multiple Correlation 

Certain limitations to the use of partial and multiple corre- 
lation may be indicated in concluding this section. 

(1) In order that partial coefficients of correlation be valid 
measures of relationship, it is necessary that all zero order 
coefficients be computed from data in which the regression is 
linear. If there is any doubt as to linearity, the tests given on 
page 372 should be employed. 

(2) Тһе number of cases in а multiple correlation problem 
should be large, especially if there are a number of variables; 
otherwise the coefficients calculated from the data will have 
little significance. Coefficients which are misleadingly high or 
low may be obtained when studies which involve many variables 
are based on relatively few cases. The question of accuracy of 
computation is also involved. A general rule advocated by 
many workers is that results should be carried to as many 
decimals as there are variables in the problem. How strictly 
this rule is to be followed must depend upon the accuracy of the 
Original measures. р 

„cut interpretation of a 


(3) A serious limitation to & clear. 
Dartial r arises from the fact that most of the tests employed 


by psychologists probably depend upon à large number of 
“determiners.” When we “ partial out” the influence of clear- 


cut and relatively objective factors such as age, height, school 
grade, ete., we have a reasonably clear notion of what the 
“partials” mean. But when we attempt to render variability 
due to “logical memory” constant by partialling out memory 
test scores from the correlation between general intelligence 


+ B Ў жа ер 
test scores and educational achievement, the result is by no 


458 STATISTICS IN PSYCHOLOGY AND EDUCATION 


(c) Determine the independent contribution of each of the selected 
factors to crime rate (to R?). 
(d) Compare R and R. Why is the adjustment fairly large? 
(see p. 451) 
What is the probable crime rate (from Problem 1) for a city in 
which X; = 15.0, X; = 50%, Xs = 6.0 and X; = 20.00? 
(b) For a city in which X; = 13, X, = 48%, X, = 5.0 and X; 
= 22.00? 
(с) By how much does the use of multiple R reduce с 
3. In Problem 4, page 432: 
(a) Work out the regression equation using the Wherry-Doolittle 
method. 
(b) How much shrinkage is there wh 
errors (p. 451)? 


2. (a 


> 


(est. хо? 


еп Riga is corrected for chance 


ANSWERS 
1. (а) The Е’ are, for Test 6, -540; for Tests 6 and 1 
6, 1, and 5, .713; for Tests 6, 1, 5, and 7, 
.702, when Test 4 is added. 
(0) X. = — 42 X, + 3.35 Xi + .82 X, — 40 X; — 134.59. 
C (est. хо = 5.47 
(с) Rewis = .121 + 942 + .210 + .043. Tests 6, 1, 5, and 7 
contribute 12%, 24%, 21%, and 4%, respectively. 
(d R = 785; R = :722; shrinkage is .063. 
2. (a) 23.53 
(b) 16.05 
(c) From 7.9 to 5.5 or 30% 
3. (b) Ries is .59. 


, 674; for Tests 
-722. R drops to 


ae’ | 


REFERENCE TABLES 


- 
\ 


REFERENCE TABLES 461 


TABLE 17 


THE BASELINE BETWEEN THE 
Orr FROM THE MEAN IN UNIT 


Example: between the mean and a point 1.380 (s = 1.38) are found 


41.62% of the entire area under the curve. 
T 
= .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 


с 

0000 0040 0080 0120 0160 0199 0239 0279 0319 0359 

0398 0438 0478 0517 0557 

0793 0832 0371 0910 0948 0987 1026 1064 1103 1141 

1179 1917 1255 1293 1331 1368 1406 1443 1480 1517 

1554 1591 1628 1664 1700 1736 1772 1808 1344 1879 

1915 50 1985 2010 2054 2088 2123 

2257 1521 Јаве 2357 2389 2422 2454 2486 2517 2549 

2580 2611 2642 2673 2704 2734 2764 2794 2823 2852 

2881 2910 2939 2967 2995 3023 

3159 2555 3212 3238 3264 3290 3315 3340 3365 3389 

3413 3438 3461 3485 55 

5222 3665 3086 8715 3729 3749 3770 

849 3869 3888 3 5 

4032 4049 4066 4082 4099 4115 4131 4147 2 

4199 4207 4222 4236 4300 4319 
4394 4406 4418 4429 4441 


4332 4345 
4554 En 0 4582 4591 4599 
4564 45 582 6 
4641 429 4656 4664 4671 4678 4686 4693 4699 4706 
4713 4719 4726 4732 4738 4744 4750 
4772 4783 4788 4793 480 
(ies 4830 4834 4838 4842 4846 4850 4854 4857 
4861 4864 4868 4871 EE. 4506 4000 4911 
: 4032 4934 4936 


ay EE. 
1 ~ es 
4 - 4048 4949 4951 4952 


4938 4940 4941 

AC ER. 4957 4959 4960 

ое ип mu 

UE REOR р TT 
4 4987.8 4988.2 4988.6 4988. . . . 

29005 28666 АВТ 4991.3 4991.6 4991.8 4992.1 4992.4 4992.6 4992.9 


4998.409 
4998.922 
4999.277 
4999.519 
4999.683 


4999.966 
4999.997133 


Сп ны Н 00020202. (202 02 02 C2 
Ст i (502020202 G2 Q2 Q2 NNNNN NN 
o бл оою51 . . . . . . . . . . ою ки р та! pn son 
> Ou Gio обом sivo OHNO TT D CREE эор 
зом RWHHO 
A 
00 
to 
ке 


462 REFERENCE TABLES 


TABLE 18 


FRACTIONAL Parts OF THE Тотли, AREA (TAKEN ав 10,000) UNDER me 
Мовмль PROBABILITY CURVE, CORRESPONDING TO SANGRE Д 
THE BASELINE BETWEEN THE MEAN AND Successive Points Lat 
Orr FROM THE MEAN ІХ Units оғ РЕ 


Example: between the mean and a point 1.55 PE (= = 155) from 


the mean are found 35.21% of the entire area under the curve. 


z 5 Е.А .05 
PR .00 .05 PE -00 
0 0000 0135 3.0 4785 4802 
4 0269 0403 81 4817 4832 
2 0537 0670 82 4846 4858 
2 0802 0933 3.3 4870 4881 
4 1063 1193 3.4 4891 4900 
`5 1320 1447 3.5 4909 4917 
6 1571 1695 3.6 4924 4931 
& 1816 1935 3.7 4937 4943 
8 2053 2168 3.8 4948 4953 
9 2281 2392 3.9 4957 4961 
1.0 2500 2606 4.0 4965 4968 
1.1 2709 2810 4.1 4972 4974 
1.2 2909 3004 4.2 4977 4979 
1.3 3097 3187 4.3 4981 4983 
1.4 3275 3360 4.4 4985 4987 
1.5 3442 3521 4.5 4988 4989 
1.6 3597 3671 4.6 4990 4991 
17 3742 3811 4.7 4992 4993 
1.8 3876 3939 4.8 4994 4995 
1.9 4000 4058 4.9 4995 4996 
2.0 4113 4166 5.0 4996 4997 
2.1 4217 4265 5.1 4997.1 4997.4 
2.2 4311 4354 52 4997.7 4998 
2.3 4396 4435 5.3 4998.2 4998.5 
2.4 4473 4508 5.4 4998.6 4998.8 
2.5 4541 4573 5.5 4999 4999.1 
2.6 4603 4631 5.6 4999.2 4999.3 
2.7 4657 4682 5.7 4999.4 4999.5 
2.8 4705 4727 5.8 4999.54 4999.6 
2.9 4748 4767 5.9 4999.65 4999.7 


REFERENCE TABLES 


TABLE 23 


To FACILITATE THE CALCULATION OF T-ScoRES 


The percents ref 


given score + 


Percent 
.0032 
.0048 
.007 


res 


er to the percentage of the total freq 
1/2 of the frequency on that score. Т-зсогез are 


uency below а 


d directly from the given percentages. 


T-score 


Percent 


53.98 
57.93 
61.79 
65.54 
69.15 
12.57 
75.80 
78.81 
81.59 
84.18 
86.43 
88.49 
90.32 
91.92 
93.32 
94.52 
95.54 
96.41 
97.13 
97.72 
98.21 
98.61 
98.98 
99.18 
99.38 
99.58 
99.65 
99.74 
99.81 
99.865 
99.903 
99.931 
99.952 
99.966 
99.977 
99.984 
99.9890 
99.9928 
99.9952 
99.9968 


T-score 


463 


tft 


464 REFERENCE TABLES 


TABLE 29 


TABLE oF ( 


Ков. Ове IN DETERMINING THE ВЕ 
Ir М 15 Larce, TABLES 17 AN 


LIABILITY OF STATISTICS. 
D 18 May ВЕ Гер. 


Example: Ап (N — 1) = 35 and t = 2.03 means that 5 times 


in 100 trials a divergence as large as 


that obtained may be ex- 


pected in the positive and negative directions. 


зерен сі PROBABILITY (P) 

Freedom _ " 

UN == 1) 0.50 0.10 0.05 0.02 0.01 
1 Е = 1000 4-63 161271 dedi { = 63.66 
2 0.816 2.92 4.30 6.96 9.92 
3 765 2.35 3.18 4.54 9.84 
4 741 2.13 2.78 3.75 4.60 
5 727 2.02 2.57 3.36 4.03 
6 ТІН 1.94 2.45 3.14 3.71 
7 711 1.90. 2.36 3.002 3.50. 
8 .706 1.86 2.31 2.90 3.36 
9 .703 1.83 2.26 2 82 3.25 

10 .700 1.81 2.93 2.76 3.17 
11 .697 1.80 2.20 2.72 3.11 

12 .695 1.78 2.18 2.63 3.06 
13 .694 1.77 2.16 2.65 3.01 

14 .692 1.76 2.14 2.62 2.98 
15 .691 1.75 2.13 2.60 2.95 
16 .690 1.75: 9 19 2.58 2.02 
17 .689 1.74 2.11 2.57 2.96 
18 .688 1.73 2.10 2 55 2.88 
19 .688 1.73 2.09 2.54 2.86 
20 .687 1.79 2.09 2.53 2.84 
21 .686 1.72 2.08 2.52 2.83 
22 .686 1.72 2.07 2.51 2.82 
23 .685 1.71 2.07 2.50 2.81 

24 .685 1.71 2.06 2.49 2.80 
25 .684 131 2.06 2.48 2.70 
26 .684 121 2.06 2.48 2.78 
27 .684 1.70 2.05 2.47 2.77 

28 683. 1.70 2.05 2.47 2.76 

29 .683 1.70 2.04 2.46 2.76 

30 .683 1.70 2.04 2.46 2.75 

35 .682 1.69 2.03 2.44 2.72 

40 .681 1.68 2.02 2.42 ӨТІ 

45 .680 1.68 2.02 2.41 2.69 

50 .679 1.68 2.01 2.40 2.68 

60 .678 1.67 2.00 2.39 2.66 

70 pis 1.67 2.00 2.38 ae 

80 .67 200 1.99 2.38 2.04 | 
90 677 1.99 2.37 79:63 


465 


те. 


268050 | 296'LF | елген | 995'07 | 022796 | 089752 | 965766 | 80456 | 795'856 | 60900 66781 906791 €S6'F1 
999°6r | £6997 | 7622% | L80°6E | 661726 | 1LOP'SE | 966'86 | LG Yo | 947'55 | 89/61 80221 Рет OCC YI 
5/@°8ў | біғеғ | 226717 | 91676 | 22076 | 168'Т8 | 966776 | 479'55 | 88910 | 66681 866 0I LES TI 59681 
e96:9* | OF Fr | ЕТГОР | 194/98 | 21626 | 616706 | 966796 | 61455 | $04705 | TIUSI 187791 Scl TI 628`б1 
сғ9<ғ | 92827 | 8988°8® | ©96°6© | <6/716 | 975'65 | 966756 | 667/16 | 06861) cba LI 64€ GI 60% €I 861701 
PLS EP | 99817 | 52976 | 586776 | 979706 | S4USc | 266776 | £98 06 | OPO'SI | CLP OT IIO TI 26971 РОС ll 
096'cv | OLZ OF | 91%96 | 961'$$ | 659766 | 96026 | 2656766 | £V6'6L | 690817) 669°6Т SYS'tI 666' TI 958701 
329-17 | 80686 | 221726 | 200728 | 6082 | 81096 | 286766 |15061 | 28121 | SPST 160761 £66 TI 96T 0I 
686 0v | 02928 | #56'$$ | 618708 | 106772 | 66676 | 266716 | IOUST | ТОТ ТОТ 866701 00901 2Ұ%6 
геб-86 | EPEE | 1279226 | 921966 | 121796 | 898766 | 266706 | 8121 | СҮРТ | ОСТ 16911 2166 268% 
990776 | 02056 | 0116 | 21-82 | 850706 | SLL'ZZ | 16661 | 99С'91 | SLE'PT| РОТ 16801 2666 69078 
61:96 | 769726 | vFUO? | "00:22 | 00672 | 6897112 | 866781 | 65691) ӨТӨТ 159711 LITOT 2998 6597, 
cog ve | 975726 | 69882 | 68622 | 094/55 | 109706 | 88 ДТ |OPV' YI | 20801 | COS OI 06676 90672 S102 
боғ-96 | 56606 | 28922 | 6917 | 619112 | 11961 | SECOL | 1881 | 000501) <8001 2198 6607), 8079 
00022 | е69:62 | 96292 | 579"55 | 8999700 | SISTI | SEEST |75001 | SSTIT) <166 967, 7199 21879 
81908 | 62:82 | 966%2 | 70626 | 11661 | 226771 | GES PL | IG TL | 20601) 2768 1964 386'3 6209 | SI 
ІРІ"62 | 678792 | 68962 | FOOLS | ISUST | 22691 | GEEET |10807 | 2976 0622 12479 89679 099% | %1 
880:72 | 27792 | 290726 | 61861 | 86:91 | GIUSI | 0761 | 92676 7593 ZPO'L 26875 69/79 LOTY | St 
115.95 | Y90 vc | 920112 | 6287 | 21891 | 110%1 | ОРЕТТ | 76076 20872. 7069 96074 SLVY ILSE | St 
GcL'Vo | 81922 | 97061 | S26 4T | LEO YT | 668701 | 1701 | SPT'S 6869 8/44 6,6% 609% 9506 | IT 
60z'ec | 191712 | 20681! 486/81 | SHF'SE | 18/711 | 29606 |2962 74%” 898% 07676 6€0'€ 8666 | OL 
999112 | 67961 | 61691 | %89%1 | SSSI | 95901 | SHE'S | 6669 08274 в9Г” 66676 COG 880% | 6 
06005 | S9UST | 70991 | 29661 | O£O' IL | 226 | PEL |1044 169v 0676 SELG geog Тт |3 
CLY'SI | 22091 | 79071 | /10°б1 | 6086 | £898 |9789 |TL9F 6586 5586 LITZ 99971 бест | A 
21891 | SLO'ST | 26921 | Р9'0Т | 8028 | Tess |8769 |6096 0/06 1066 596971 73%! 2280 |9 
980°СТ | 8881 | OLO'IT | 9806 |6807, | 3909 |166% |0006 6760 0191 СРТ GSLO 7990 | 3 
112221 | 89911 | 88ғ6 | 6222 |6866 | 3489 |2066 |S6l6 6971 79071 11/70 657`0 2660 |F 
сетті |7696 |618: |1609 |69? |S990€ |9966 |TGVI с0071 78$`0 05570 с8770 GITO |$ 
0126 | ZSL |1669 |<09% |6106 |806 |9861 | SILO 9vT'O TISO 80170 TOTO'O 10200 |6 
сей |21ғ9 |186 |9072 |91 | PLOT | SSO |ӨРГО |67900 |85100 | 606000 | 80900070 1810000 | T 
©0`0 €0'0 070 0070 0670 0570 0/70 0870 0670 


*s1ousi[qnd jo uotsstused Aq ‘роя 9 AO ' 


670 8600 |660 = 4 | f 
519%ү210 A4 2202524 ао] роу] ИГ 10211932098 $ лЭЧ "V "Ч оц pe3dvpy 
(1492 94} јо Арод әді ш pozulid олт zX jo sonpuA eur) 


анупОб-їнгу чо G'ISV.], 


08 ATAVL 


466 REFERENCE TABLES 


TABLE 49 


CORRELATION COEFFICIENTS AT THE 5% AND 1% LEVELS оғ 
SiGNIFICANCE 


Ezample: When N is 52 and (N — 2) is 50, an г must be .273 to be 
Significant, at .05 level, and .354 to be Significant at .01 level, 


Degrees of Degrees of 
freedom 05 .01 freedom .05 .01 
(№ —2) (N — 2) 

1 -997 1.000 24 .496 
2 -950 -990 25 3st 25 
3 .878 .959 26 1374 478 
4 811 :917 27 367 470 
5 1754 .874 - 28 361 463 
6 707 .834 29 355 456 
7 .666 -798 30 349 449 
8 .632 -765 35 325 418 
9 :602 135 20 304 303 
10 576 7708 45 288 372 
11 553 .684 50 273 354 
12 532 :661 60 250 395 
13 514 .641 70 239 302 
14 497 1623 80 21 283 

15 482 `606 90 5 2 
205 267 
16 468 .590 100 195 254 
17 456 .575 195 174 1228 
18 444 561 150 159 208 
19 438 549 200 138 asi 
20 Ага 587 300 113 148 
22 A 526 400 008 128 
ERE 515 500 088 115 


23 -396 1505 - 1000 1062. 1081 


REFERENCE TABLES 467 


TABLE 54 


ГПехілтев (z/o) іу TERMS OF o-Unirs AND ORDINATES (z) FOR 
Given AREAS MEASURED FROM THE Mean ОЕ A NORMAL 
DISTRIBUTION WHuosE ToTAL AREA = 1.00 


[z/o = x] 
Area from 
the Mean * T (к/о) = ME x or (2/0) t 

(а) (о) 

.00 .399 .26 .706 311 
01 .399 27 .739 304 
.02 .398 .28 772 .296 
.03 .398 .29 .806 .288 
.04 .397 .30 84 .280 
.05 .396 EH ‚878 2271 
.06 .394 .32 .915 .262 
07 .393 .33 .954 .253 
.08 .391 .94 .995 .243 
.09 .389 .35 1.036 .233 
.10 .386 .36 1.080 .223 
lL .384 87 1.126 1212 
12 .381 .38 1.175 .200 
.18 .378 139 1.227 .188 
.14 .374 40 1.282 1176 
15 .370 ES 1.341 .162 
16 .366 42 1.405 1149 
47 362 43 1.476 134 
18 1358 44 1.555 1119 
19 353 45 1.645 103 
20 348 46 1.751 1086 
21 "342 47 1.881 ‘068 
22 1337 48 2.054 1048 
23 381 49 2.326 027 
94 324 150 co 000 
25 818 


* At the .05 level tne CR = 1.98, at the .01 level 2.63, when the 


(М – 1) = 99 


37 


"м 


STATISTICS IN PSYCHOLOGY AND EDUCATION 471 


TABLE оғ SQUARES AND Square Roo 
Square Root ` 
15 


Number 


оомо њот 


10 


Square 


ы-і» o 
orhon AMOR o 


оњ оу 


соо 
бе = 


оон 


РЕЧИ рр» WWWWH 0200121212 


NNDDD ooooo ooooo noon naan 


000 


Ts OF THE NUMBERS FROM 1 то 1000 


Number 


Square 
26 01 


Square Root 
.141 


NN чачам 
> 
% 
© 


1-1 
о 
E 
- 


ооо ооооо ооооо ооооо 0000050500 0700000000 0000000000 VPN 
ч ie CNET 
= 
»- 


- 
oc 
88 
a 
о 


472 STATISTICS IN PSYCHOLOGY AND EDUCATION 


TABLE or Squares AND Square Roors—Continued 


Number - 
s 01 i ETA Square Root N d res Square Root 
102 1 04 04 152 22201 12.288 
103 106 09 153 2 104 12.329 
102 10516 De 25500 12:410 
1 à 
1925 155 24025 12.450 
106 11236 
107 11449 190 24336 12.490 
108 11664 57 2 46 49 12.530 
109 11881 158 2 49 64 12.570 
110 12100 159 2182 i 12.610 
12.649 
11 12321 
112 12544 161 25921 12.689 
113 127.69 162 26244 12.728 
114 12996 163 265 69 12.767 
115 132 25 164 268 96 12.806 
318 i8 165 27225 12.845 
4 56 
117 136 89 166 275 56 12.8 
118 139 24 167 278 89 12:90 
119 14161 168 28224 12:961 
120 144 00 169 285061 13:000 
121 "o 170 2 89 00 13.038 
1 
122 148 84 171 29241 
123 15129 172 2 95 84 183077 
124 15376 173 29929 13.153 
125 1 56 25 Ms 30276 13.191 
126 roe 5 3 06 25 13.229 
16129 қ 176 3007 
128 — 16384 я 17 31336 13:304 
129 16641 * 178 31684 13.342 
130 16900 4 19 32041 1522 
m 1 n ві муи Е 3 2400 13.416 
424 11. 1 327 
E 52 11:533 182 33i д ай 
E 95 Е 3 s 
1858 — 18225 11:576 UNE 38 59 15-225 
137 187 59 11.662 18 — ipee 
69 11. 6 
138 19044 1-75 187 — $1500 13.638 
139 193 21 11.790 188 35344 155870 
140 196 00 11.832 189 35721 2.71 
190 36100 13.748 
1 198 81 11.874 191 nares 
164 11.91 36. 
142 20000 11958 18 sus 12 
7 12.00 72 49 % 
5 ina шою м PER ое 
2. 
09 12.12 3 84 
M8 21901 1216 197 38800 12-100 
149 22201 12.207 198 392 04 14.071 
150 22500 12.247 199 39601 14.107 
200 4 00 
00 14.142 


STATISTICS IN PSYCHOLOGY AND EDUCATION 473 


TABLE OF SQUARES AND SQUARE Roors—Continued 


Number Square Square Root Number Square Square Root 
201 40401 14.177 251 63001 15.843 
202 405024 14.213 252 63504 15.875 
203 41209 14.248 253 64009 15.906 
204 41616 14.283 251 6456 15.937 
205 42025 14.318 255 65025 15.969 
206 42436 14.353 256 65536 16.000 
207 42849 14.387 257 66049 16.031 
208 432020 14.422 258 66561 16.002 
209 43681 14.457 259 67081 16.093 
210 44100 14.491 200 67600 16.125 
211 44521 14.526 261 68121 16.155 
212 44944 14.560 262 65644 16.186 

4.595 263 69169 16.217 
2з M d 261 69696 10.248 
214 45796 14.029 ан 469 — 1034 
215 46225 14.063 22 : 
266 70756 16.310 
216 4 66 56 14:597 267 7.12 89 16.340 
27 47089 x 268 71894 16.371 
218 47524 14.705 2s тм 18.37] 
219 47961 14.799 200 72200 154% 
220 48400 14.832 7 
т 73441 16.462 
21 48841 14-5 21) 73984 10.492 
222 49288 14-000 273 74529 16.523 
223 49729 1% 2:4 75076 10.553 
224 Т Е 5625 16.583 
225 50625 15.000 275 T 
76176 16.613 
200 51076 — 19.0 270 76729 10.043 
227 51529 15.067 27 ТЕН 16.673 
223 5195 15.100 2S 77841 18-703 
229 5 24 4 13 78400 к 
230 52900 15.106 280 pi 
78961 16. 
231 533 61 15.199 281 79524 16.793 
232 52894 15.232 252 80059 16.823 
233 54259 15.264 552 80650 16.852 
234 54756 15.297 245 81225 16.882 
235 55225 15.330 
М 246 81796 16.912 
236 5569 15-302 280 62369 16.941 
987 50160 15-20 2870 820944 16.971 
238 56644 15.127 559 83521 17.000 
239 57121 15.409 20 81100 17.029 
ED вен ШО 84681 17.059 
2 I 
241 55051 15.524 20 $52 64 17 088 
242 55502 15-558 22 85849 17117 
243 5 90 49 15.588 294 8 64 36 17 146 
244 59586 19.05 205 87025 17.176 
245 60025 15.652 а ея 
296 - 
246 60516 15. 296 $5209 17.234 
247 61009 15.710 208 85804 17 263 
28 61501 15.70 209 69200 17.292 
249 62001 15.780 30 90000 17.321 
250 6 25 00 15.811 


474 STATISTICS IN PSYCHOLOGY AND EDUCATION 


TABLE or SQUARES AND SQUARE Roots—Continued g 
Number Square Square Root Number Square Square Root 

301 90601 17.349 351 12 3201 18.735 

302 9 12 04 17.378 352 12 39 04 18.762 

303 9 18 09 17.407 353 12 46 09 18.788 

304 9 24 16 17.436 354 12 53 16 18.815 

305 9 30 25 17.464 355 12 60 25 18.84) 

306 9 36 36 17.493 356 12 67 36 18.868 

307 9 42 49 17.521 357 12 74 49 18 894 

308 9 48 64 17.550 358 12 8164 18.921 

309 95481 17.578 359 128881 18.947 

310 9 6100 17.607 360 12 96 00 18.974 

311 9 67 21 17 635 361 13 03 21 19.000 2 
812 97344 17 664 362 13 10 44 19.026 y 
313 9 79 69 17.692 363 13 17 69 19.053 

314 9 85 96 17.720 364 13 24 96 19.079 

315 992 25 17.748 365 13 32 25 19.105 

316 9 98 56 17.776 366 13 39 56 19.131 

317 10 04 89 17.804 367 13 46 89 19.157 

318 101124 17 833 368 13 54 24 19.183 

319 1017 61 17.861 369 13 6161 19.209 

320 10 24 00 17.889 370 13 69 00 19.235 

321 10 30 41 17.916 371 13 76 41 19.261 

322 10 36 84 17 944 372 13 83 84 19.287 

323 10 43 29 17 972 373 13 91 29 19.313 

324 10 49 76 18.000 374 13 98 76 19.339 Ly 
325 10 56 25 18.028 375 14 06 25 19.363 - 
326 10 62 76 18.655 376 14 13 76 19.391 

327 10 69 29 18.083 377 14 A 29 19.416 

328 10 75 84 18.111 378 14 28 84 19.442 

329 10 82 41 18.138 379 14 36 41 19.468 

330 10 89 00 18.166 380 14 44 00 19.494 

331 10 95 61 18.193 381 .519 

332 11 02 24 18.221 382 H 30 51 19 .545 

333 11 08 89 18.248 383 14 66 89 19.570 

334 11 1556 18.276 384 14 74 56 19.596 

335 11 22 25 18.303 385 14 82 25 19.621 

336 1128 96 18.330 386 19.647 

337 1135 69 18.358 387 14 59 55 19.672 А 
338 11 42 44 18.385 388 15 05 44 19.698 > 
339 114921 18.412 389 151321 19.723 т 
340 1156 00 18.439 390 15 2100 19.748 

341 116281 18.466 391 15 19.774 

342 11 69 64 18.493 392 15 25 el 19.799 

343 11 76 49 18.520 393 15 44 49 19.824 

344 1183 36 18.547 394 15 52 36 19.849 

345 119025 18.574 395 15 60 25 19.875 

346 11 97 16 18.601 396 15 68 16 19.900 i 
347 12 04 09 18.628 397 15 76 09 19.925 > 
348 12 11 04 18.655 398 15 84 04 19.950 P 
349 12 1801 18.682 399 159201 19.975 

350 122500 18.708 400 160000 20.000 


401 
402 
403 
404 
405 
406 
407 
408 
409 
410 
411 
412 
413 
414 
415 
416 
417 
418 
419 
420 
421 
422 
423 
424 
425 
426 
427 
428 
429 
430 
431 
432 
433 
434 
435 
436 
437 
438 
439 
440 
441 
442 
443 
444 
445 
446 
447 
448 
449 
450 


STATISTICS IN PSYCHOLOGY AND EDUCATION 475 


Taste or SQUARES AND SQUARE 
Number Square Square Root 


16 08 01 
16 16 04 
16 24 09 
16 32 16 
16 40 25 


16 48 36 
16 56 49 


16 89 21 
16 97 44 
17 05 69 
17 13 96 
17 2225 


17 30 56 
17 38 89 
17 47 24 
17 55 61 
17 64 00 


17 7241 
17 80 84 


18 57 61 
18 66 24 
18 74 89 
18 83 56 
18 92 25 


19 89 16 
19 98 09 
20 07 04 
20 16 01 
20 25 00 


20.025 


Number Square 


Roors—Continued 
Square Root 
2034 01 21.237 
20 43 04 21.200 
20 52 09 21.284 
20 61 16 21.307 
20 70 25 21.831 
20 79 36 21.354 
20 88 49 21.378 
20 97 64 21.401 
2106 81 21.424 
21 16 00 21.448 
21 25 21 21.471 
21 34 44 21.494 
2143 69 21.517 
21 52 96 21.541 
21 62 25 21.564 
217156 21.587 
21 80 89 21.610 
21 90 24 21.633 
21 99 61 21.656 
22 09 00 21.679 
22 18 41 21.703 
222784 21.726 
223729 21.749 
224676 21.772 
22 56 25 21.794 
22 65 76 21.817 
22 75 29 21.840 
22 84 21.863 
22 94 41 21.886 
23 04 00 21.909 
23 13 61 21.932 
23 23 24 21.954 
23 32 89 21.977 
23 42 56 22.000 
23 52 25 22.023 
23 61 96 22.045 
23 71 69 22.068 
23 8144 22.091 
23 9121 22.113 
24 01 00 22.136 
24 10 81 22.159 
24 20 64 22.181 
24 30 49 22.204 
24 40 36 22.226 
24 5025 22.249 
24 60 16 22.271 
24 70 09 22.203 
24 80 04 22.316 
24 90 01 22.338 
25 00 22.361 


476 STATISTICS IN PSYCHOLOGY AND EDUCATION 


TABLE ОР Squares AND 8 
Squaro Root 


Square 
25 1001 
25 20 04 
25 30 09 
25 40 16 
25 5025 


25 60 36 
25 70 49 
25 80 64 
25 90 81 
26 01 00 


26 1121 
26 21 44 
26 31 69 
20 41 96 
20 52 25 


22.383 


551 


Bquare 
30 36 01 
30 47 04 
30 58 09 
30 69 16 
30 80 25 


30 91 36 
31 02 49 
31 13 64 
312481 
31 36 00 


31 47 21 
3158 44 
31 69 69 
31 80 96 
319225 


32 03 56 
32 14 89 
32 26 24 
32 37 61 
32 49 00 


32 60 41 
32 71 84 
32 83 29 
32 94 76 
33 06 25 


33 17 76 
33 29 29 
33 40 84 
33 52 41 
33 64 00 


33 75 61 
33 87 24 
33 98 89 
34 10 56 
34 22 25 


34 33 96 
34 45 69 
34 57 44 
34 69 21 
34 81 00 


34 92 81 
35 04 64 
35 16 49 
35 28 36 
35 40 25 


35 52 16 
35 64 09 
35 76 04 
35 88 01 
36 00 00 


QUARE Roors— Continued 
Number 


Bquare Root 
23.473 


~l 


STATISTICS IN PSYCHOLOGY AND EDUCATION 477 


TABLE оғ SquaRES AND SquaRE Roors—Continued 


Square 
36 12 01 
36 24 04 
36 36 09 
36 48 16 
36 60 25 


36 72 36 
36 84 49 
36 96 64 
37 08 81 
37 21 00 


37 33 21 
37 45 44 
37 57 69 
37 69 96 
37 82 25 


37 94 56 
38 06 89 
38 19 24 
38 31 61 
38 44 00 


38 56 41 
38 68 84 
38 81 29 
38 93 76 
39 06 25 


39 18 76 
39 31 29 
39 43 84 
39 56 41 
39 69 00 


39 81 61 
39 94 24 
40 06 89 
40 19 56 
40 32 25 


40 44 96 
40 57 69 
40 70 44 
40 83 21 
40 96 00 


41 08 81 
41 21 64 
41 34 49 
41 47 36 
41 60 25 


41 73 16 
41 86 09 
41 99 04 
42 12 01 
42 25 00 


Square Root 


Number 
651 


Square 
42 38 01 
42 51 04 
42 64 09 
42 77 16 
42 90 25 


43 03 36 
43 16 49 
43 29 64 
43 42 81 
43 56 00 


43 69 21 
43 82 44 
43 95 69 
44 08 96 
44 22 25 


44 35 56 
44 48 89 
44 62 24 
44 75 61 
44 89 00 


45 02 41 
45 15 84 
45 29 20 
45 42 76 
45 56 25 


45 69 76 
45 83 29 
45 96 84 
46 10 41 
46 24 00 


46 37 61 
46 5124 
46 64 89 
46 78 56 
46 92 25 


47 05 96 
47 19 69 
47 33 44 
47 47 21 
47 61 00 


47 74 81 
47 88 64 
48 02 49 
48 16 36 
48 30 25 


48 44 16 
48 58 09 
48 72 04 
48 86 01 
49 00 00 


Square Root 
25.515 
25.534 
25.554 
25.573 
25.593 


25.612 
25.632 


78 STATISTICS IN PSYCHOLOGY AND EDUCATION 


701 


TABLE or Squares AND Square Roors—Continued 
Number Зачаго 


49 14 01 
49 28 04 
49 42 09 
49 56 16 
49 70 25 


49 84 36 
49 98 49 
50 12 64 
50 26 81 
504100 


505521 
50 6944 
50 83 69 
50 97 96 
511225 


5126 56 
51 40 89 
51 55 24 
51 69 61 
51 84 00 


51 98 41 
52 12 84 
52 27 29 
524176 
52 56 25 


527076 
528529 


S 
ы 
a 


ae gees PE 
25 SASS В 


Square Root 
26.476 


27.000 
27.019 


Number 
751 


Square 
56 40 01 
56 55 04 
56 70 09 
56 85 16 
57 00 25 


57 15 36 
57 30 49 
57 45 64 
57 60 81 
57 76 00 


57 91 21 
58 06 44 
58 21 69 
58 36 96 
58 52 25 


58 67 56 
58 82 89 
58 98 24 
59 13 61 
59 29 00 


59 44 41 
59 59 84 
59 75 29 
59 90 76 
60 06 25 


60 21 76 
60 37 29 
60 52 84 
60 68 41 
60 84 00 


60 99°61 
611524 
61 30 89 
61 46 56 
61 62 25 


61 77 96 


Square Root 
27.404 
27.423 
27.441 
27.459 
27.477 


27.495 
27.514 
27.532 
27.550 
27.568 


27.586 
27.604 
27.622 
27.641 
27.659 


27.677 
27.695 
27.713 
27.731 
27.749 


27.707 
27.785 
27.803 
27.821 
27.839 


27.857 
27.875 


Wr 


А 


4 


801 
802 
805 


STATISTICS IN PSYCHOLOGY AND EDUCATION 479 


m — oF SQUARES AND SQUARE Roots—Conlinued 


48 09 


аге 988868 ЕВЕР LFL 
БЕРІ Sees ER 
apga БЕЗЕР SESS 55 


an 
S228 BASSI 


99 05 61 
69 22 24 
69 38 89 
69 55 56 
69 72 25 


69 88 96 
70 05 69 
70 22 44 
70 39 21 
70 56 00 


70 72 81 
70 89 64 
71 06 49 
71 23 36 
71 40 25 


71 57 16 
71 74 09 
71 91 04 
72 08 01 
72 25 00 


Square Root 
28.302 
28.320 


Number 


Square 
72 42 01 
72 59 04 
72 76 09 
72 93 16 
73 1025 


73 27 36 
73 44 49 
73 61 64 
73 78 81 
73 96 00 


74 13 21 
74 30 44 
74 47 69 
74 64 96 
74 82 25 


74 99 56 
75 16 89 
75 34 24 
75 51 61 
75 69 00 


75 86 41 
76 03 84 
76 2129 
76 38 76 
76 56 25 


76 73 76 
76 91 29 
77 08 84 
7726 41 
77 44 00 


77 61 61 
77 19 24 
77 96 89 
73 14 56 
78 32 25 


78 49 96 
78 67 69 
78 85 44 
79 03 21 
79 21 00 


79 38 81 
79 56 64 
79 74 49 
79 92 36 
80 1025 


80 28 16 
80 46 09 
80 34 04 
80 82 01 
81 00 00 


Square Root 


480 STATISTICS IN PSYCHOLOGY AND EDUCATION 


TABLE OF SQUARES AND Si 


Square 
811801 
81 36 04 
81 54 09 
81 72 16 
81 90 25 


82 08 36 


8 
w 
e 
oo 
oo 


RE PERES SS 
Е5588 SE 


e 

88 
A 
= 


86 67 61 
86 86 24 
87 04 89 
87 23 56 
87 4225 


87 60 96 
87 79 69 
87 98 44 
881721 
88 36 00 


885481 
88 73 64 
88 92 49 
89 1136 
89 30 25 


89 49 16 
89 68 09 
89 87 04 
90 06 01 
90 25 00 


Square Root 
30.017 


Number 


Square 

90 44 01 
90 63 04 
90 82 09 
9101 16 
912025 


91 39 36 
9158 49 
91 77 64 
91 96 81 
92 16 00 


92 35 21 
92 54 44 
92 73 69 
92 92 96 
93 12 25 


93 31 56 
93 50 89 
93 70 24 
93 89 61 
94 09 00 


94 28 41 
94 47 84 
94 67 29 
94 86 76 
95 06 25 


95 25 76 
95 45 29 
95 64 84 
95 84 41 
96 04 00 


96 23 61 
96 43 24 
96 62 89 
96 82 56 
97 02 25 


97 21 96 
97 41 69 
97 6144 
97 8121 
98 01 QQ 


98 2081 
98 40 64 
98 60 49 
98 80 36 
99 00 25 


99 20.16 
99 40 09 
99 60 04 
99 80 01 
100 00 00 


QUARE Roors—Continued 
Square Root 
.838 


3T 


5. 


INDEX 


Accuracy, standards of, in com- 
putation, 23-27 

Ackerson, Luton, 389 

Adkins, D. C., 402 

of variance, principles of, 

how variances are 
analyzed, 254-297; illustra- 
tions of, 258-264 

Anastasi, A., 390, 430 

Anderson, J. E., 401 

Array, in а correlation table, 277 

Attenuation, correction of cor- 
relation coefficient for, 396- 
398; assumptions underlying, 
397-398 

Average, definition of, 32; of cor- 
relation coefficients, 302-303. 
See Mean, Median, and Mode. 


Bagley, W. C., 431 
Bar diagram, 96-97 
Barr, A. S., 332 

Beta coefficients, in partial and 


multiple correlation, 421-422; 
as "weights," 422; 


calculation 
of, in Wherry-Doolittle method, 
448-450 

Bias in sampling. See Sampling. 

Binomial expansion, use in proba- 
bility, 106-109; graphic repre- 
sentation of, 108-109 

Bi-serial correlation, 347-351; cal- 
culation of тъз 348-350; SE of 
ты», 350-351; alternate formula 
for, 352 

Brigham, C. C., 198 

Burks, B. S., 454, 456 


Buros, F. C., and Buros, О. K., 82 
Burt, Cyril, 451 


Central tendency, measures of, 
32-34; reliability of measures 
of, 182-193. See Mean, Me- 
dian, and Mode. 

Chesire, L., 357 

Chi-square test, as а measure of 
goodness of fit, 241-253; as a 
measure of divergence from the 
null hypothesis, 241-245, and 
from the normal distribution, 
245-246; when table entries 
are small, 246-250; when table 
entries are in percentages, 250; 
in contingency tables, 251-253; 
as measure of linearity of re- 
gression, 372-374 

Clark, E. L., 390 

Classification of measures into a 
frequency distribution, 4-7 

Class-interval, definition of, 5; 
methods of expressing, 7-10; 
midpoint of, 9; limits of, 8 

Clayton, B., 389 

Coefficient, of alienation, 335- 
336; of determination, in the 
interpretation of т, 337-339; of 
variation, or У, 65-68; of re- 
liability in correlation, 380-386; 
dependence of reliability соећ- 
cient upon variability of groups, 
393-394 

Coefficient. of correlation, mean- 
ing of, 268-270; аза ratio, 272- 
215; represented graphically, 


481 


482 


278-282; computation of, devia- 
tions from assumed means, 282- 
288; computation of, deviations 
from means, 288-291; reliabil- 
ity of, 297-302; averaging of, 
302-303; effect of variability 
upon, 325-327; interpretations 
of, 332-339 
Coin tossing, probabilities in, 105- 
109 
Column diagram. See Histogram. 
Comparison, of obtained distri- 
bution with normal probability 
curve, 123-127; of groups in 
terms of overlapping, 139-140. 
See also Chi-square, Skewness, 
and Kurtosis, 
Computation, rules for, 26-27 
Confidence intervals, Meaning of, 
187-188 
Conrad, H. S., 337 
Contingency, coefficient of (C), 
359-365; methods of comput- 
ing C, 360-365; relation of C to 
chi-square, 359 ; comparison of 
C with r, 363 
Continuous series, 2-3; 
tion of measures in, 3-4 
Coórdinate axes, 12; use ша 
correlation table, 315-317 
Correlation, linear, 278-982; posi- 
tive, negative, and zero, 270; 
expressed as a ratio, 272-975 ; 
graphic representation of, 278 
282: construction of table, 275- 
78; product-moment method 
in, 282-988; charts for use in, 
288; from ungrouped data, 


288-296; difference formula in, 


295-296; effect of errors of ob- 
servation upon, 396-398; rank 
difference method of computing, 


7; spurious, 429-432. 


tabula- 


INDEX 


See also Partial correlation and 
Multiple correlation. . 
Correlation-ratio (cta), in none 
linear relationship, ш 
computation of, 308-370; — 
ard error of, 371; correction of, 
371-372; comparison with r to 
determine linearity of regres- 
sion, 372-374 ы 
Criterion, value of, in Meo 
the validity of tests, НА» = 
prediction of by multiple pa Б 
sion equation, 413, 419, 4 199- 
Critical ratio, definition of, 
200. See t-test 
Cumulative о, method 
of computing, 74-7 Е 
Cumiiiative frequency graph, ae 
struction of, 75; smoothing 0f, 
91-92 "- 
Cureton, Е. Е., 288, 417 M 
Curvilinear relationship, 305-372 


Data, continuous and discrete, 
2-3 

Deciles. „See Percentiles. — 

Degrees of freedom, meaning ne 
191-193; in analysis of va 
ance, 257, 261 T 

Deviation. : See Quartile Чем 


4 МЕЗ an 
tion, Mean deviation, 
Standard deviation. robe 
ifferences, reliability ОЁ, 


je 
tween measures of central len 
ency, 197-214; between 5218; 
ures of variability, 215-220: 
between percentages, 21 dard 
between r’s, 302. See Standa 
error and Probable error. od 
Discrete series, 2-3; Short Meth 
applied to, 68-70 re- 
Distribution, frequency. See F 
quency distribution. 


INDEX 


Dunlap, J. W., 218, 221, 288, 417 
Durost, W. N., 288 
Dvorak, August, 288, 368 


Edgerton, Н. A., 319 

Elliott, К. M., 436 

Equation, of a straight line, 157; 
plotting of equations for regres- 
sion linesin correlation diagram, 
315-317 

Equivalent groups, method of, 
211-214 

Error, curve. of, 111. 
Normal curve. 

Errors, of sampling, 225-226; 
constant, 227, 387; chance, 
226, 386. See also Probable 
and Standard errors. 

Experimental hypotheses, testing 
of, 232-234; null hypothesis, 
199-200, 232-233 

Ezekiel, M., 337, 417, 451 


Ferguson, G. A., 401 

Fisher, R. A., 188, 191, 236, 254, 
302 

Flanagan, J. C., 402 

"ranzen, R., 67 

"requency distribution, construc- 
tion of, 4-10; normalizing а, 
149-151; graphical representa- 
tion of, 11-16 | 

"requency polygon, construction 
of, 12-16; smoothing of, 16- 
18; comparison with histogram, 
23-24 

Froelich, С. J., 385 

"-test, in analysis of variance, 
258-262 


Garrett, Н. Е., 187, 430 

Mates, А. I., 454 

Goulden, C. H., 188, 246, 254, 261 
raphic representation, principles 


See also 


483 


of, 10-12; of correlation co- 
efficient, 278-282. See also Fre- 
quency polygon, Histogram, 
Cumulative frequency graph, 
Percentile curve or Ogive, Line 
graph, Bar diagram. 

Grouping, in tabulating a fre- 
quency distribution, 7-8; as- 
sumptions in, 9-10 

Guilford, J. P., 402 


Hartshorne, H., 219 

Hawkes, Lindquist & Mann, 129, 
402 

Heterogeneity, effect of, upon 
correlation, 325-327; upon the 
reliability of measures, 393-394 

Hillegas, M. A., 162 

Histogram, 19-21; comparison of, 
with frequency polygon, 20-23 

Holzinger, K. J., 368, 389 

Homogeneity, 49; effect of, upon 
variability, 325-327 

Hull, C. L., 173 


Interval. See Class-interval. 

Item analysis, problem of, 399- 
401; and selection, 400; and 
difficulty of, 400; and validity, 


401 


Jackson, J. D., 389 
Jones, D. C., 118 
Jones, H. E., 334 


Karsten, K. G., 93 

Kelley, T. L., 121, 182, 209, 216, 
221, 359, 391 

Kelly, E. L., 390 

Kendall, M. G., 362, 370, 423 


Kuder, G. F., 383 
Kurtosis, calculation of, 121-123; 


standard error of, 222 
Kurtz, A. K., 218, 221 


484 


Levels of confidence, 187-188 

Likert, R., 164 

Lindquist, E. F., 188, 213, 254, 
258, 262, 288, 302 

Linearity of regression, tests for, 
372-373 

Line graphs, 93-95 

Line of means, plotting of, 315- 
317 

Long, J. A., 352, 402 


Martin, С. В., 337 
Matched groups, method of, 212— 
214 


May, M. A., 219, 408 

McCall, W. А., 150 - 

MeNemar, Q., 102, 112, 223, 393 

Mean, arithmetic, caleulation of, 
from ungrouped scores, 32 ; 
from frequency distribution, 
33-34: when to use, 45; 
when data are discrete, 68-70; 
reliability of, 182-193; limits 
of accuracy for, 184-185 

Mean deviation, or MD, calcula- 
tion of, from ungrouped data, 
55-56; from grouped data, 56- 
58; when to use, 71 

Median, calculation of, from un- 
grouped scores, 34-36 ; from 
frequency distribution, 36-38; 
in special cases, 30-40; when 
to use, 45; when data are dis- 
crete, 68-70; reliability of, 
193-194 

Merrill, M, А., 330, 393 

Midpoint of interval, how to find, 
7-10; as Tepresentative of all 

of the scores on the interval, 9 


Mode, caleulation 0f, 40-41; when 
to use, 45 


Moore, Т. V., 456 
Morgan, J. J. B., 233 


INDEX 


Moving average, use in smoothing 
a curve, 16-17 . 
Multiple coefficient of correlation, 
Г, 400; computation of, in a 
three-variable problem, 414, 
424; formulas for, 423-426; 
significance of, 426-429; value 
of, in analysis, 454-455; limi- 
tations to use of, 455-456; 

“shrinkage” in, 450-451 


Non-linear relationship, measure- 
ment of, 865-372 T 

Normality, divergence of ig 
quency distribution from, 127 
133; normalizing a frequency 
distribution, 149-155; T-scores; 
150 

Normal probability curve, p 
104; illustrations of, 102-10 Sr 
deduction from binomial he 
pansion, 108-109; in psy¢ а 
logical measurement, 111; equ? 
tion of, 113; properties a 
113-114; constants of, 118; 
comparison of obtained а 
bution with, 123-125; use n 
solution of а variety of р 
lems, 135-146; in scaling е 
items, 146-149; in ee at 
Scales, 100-164; in sca d 
judgments, 169-171; in а 
mutation of orders of merit ae 
units of amount, 171-176; bi 
testing hypotheses by € 
Square, 245-246 tanti 

Null hypothesis, testing of адал, 
direct determination of proba а 
outcomes, 234-237; НЫЕ те 
against normal curve Hed 
cies, 238-241; in баан б 
significance of coeflicien 
correlation, 298-301 


INDEX 


Numbers, rounded, 24; exact an 
approximate, 25-26 ы 

Ogburn, W. F., 457 

Ogive, construction of, 83-87; 
percentiles and percentile ranks 
from, 84-87; uses of, 87-92 

Order of merit, ranks, 171-172; 
changing into numerical scores, 
171-175 ` 

Otis, A. S., 288, 382 

Overlapping, in the measurement 
of groups, 139-140 


Parallel forms method, in relia- 
bility of test scores, 381-382 

Parameter, definition of, 181 

Partial correlation, 404-406; il- 
lustration of, in a three-variable 
problem, 406-414; notation in, 
415; formulas for, 415; signif- 
icance of, 417; value of, in 
analysis, 451 ; limitations 
to the use of, 455-456 

Paterson, D. G., 158, 436 
зо Karl, 275, 356, 357, 363, 
371 

Percentage, standard error of, 
218-220; standard error of the 
difference between, 219 
ercentile, construction of curve 
of, 83-87; uses of curve, 84-89; 
Yanks (PR), computation of, 
80-87; graphic method of find- 
Ing ranks, 85; scale, use of, in 
combining test scores, 158; 
Scale, disadvantages of, 157-160 
*reentiles, calculation of, 77-87; 
pre method of finding, 85- 


Peters, С. C., 327, 357, 359, 363, 
p 370, 371 
hillips, F. E., 452 


485 


Pintner, R., 158 

Predictions, accuracy of, from re- 
gression equations, 320-332; 
accuracy of group, 322-325; 
“regression effect" in, 331- 
332; from multiple regression 
equations, 422 

Probability, elementary principles 
of, 104-110 

Probable error, relation to Q, 54; 
relation to other measures of 
variability in the normal dis- 
tribution, 119; of the mean, 
187; of Q, 196; of e, 194 

Probable error, of estimate, 320; 
of r, 297 

Product-moment method of find- 
ing r, 282-288 


Quartile deviation (Q), calcula- 
tion of, 51-54; when to use, 71; 
reliability of, 196 

Quartiles, 0, and Qs, computa- 
tion of, 51-54 


Range, as а measure of variabil- 
ity, 50; when to use, 71; in- 
fluence upon the coefficient of 
correlation, 325-327 

Rank-difřerence method of com- 
puting correlation, 343-347; 
when to use, 343-344 | 

Ranks, transmutation of, into 
units of amount, 171-176 

Rational equivalence, method of, 
in test reliability, 383-386 
eavis, George, 453 

аала coefficient, 312-313; 
in partial and multiple correla- 
tion, 419-422 

Regression effect, reasons for, 


331-332 : 
Regression equations, 311-318; in 


`486 


deviation form, 311-314; т 
score form, 317-318; in corre- 
lation table, 310; formulas for, 
in partial and multiple correla- 
tion, 419-422; value of, in 
predietion and control, 451— 
456; limitations to use of, 455- 
456 

Relative variability, coefficient of. Я 
65-68. See also Coefficient of 
variation. 

Reliability, meaning of, 181-182; 
of the mean, 182-193; of the 
median, 193-194; of Q, 196; 
of c, 194-196; of a percentage, 
218-220; of differences, 197— 
214; in small samples, 204-207; 
sampling and reliability, 292- 
227: of test Scores, 380-391; 
index of, 391-392; dependence 
of coefficient of, upon the size 
and variability of the group, 
393-394 

Remmers, H. Н., 390 

Rhine, J. В., 233 

Richardson, M. W., 353, 383, 402 

Ruch, G. M., 389 

Rugg, Н. O., 93 

Russell, J. T., 324 


Байт, M., 357 

Sampling, random, 223-225 ; rep- 
resentative, 224; selection in, 
226-227; reliability and, 222— 
227; biased, 223, 226 

Sandiford, Peter, 352, 402 

Scale, definition of, 146 

Scaling, of test items, 146-149; of 
total Scores, 149-160; of an- 
Swers to a questionnaire, 164— 
169; of judgments or ratings, 


169-171. See also Percentile 
scale, T-scale, 


INDEX 


Scatter diagram, 275-278 


Scores, in continuous and in dis- 


crete series, 2-3 
Semi-interquartile range, 54. See 
uartile deviation. f 
d C. L., 319, 334, 395, 433, 
454 
Shock, N. W., 390 " 
Significance, levelsof, 201-203; 05 
level, 201-202; .01 level, 203; 
table for determining, 190-191; 
05 and .01 tables of, for r, 299 
Significant figures, 24-25 


Skewness, measurement of, 119- - 


121; standard error of measures 
of, 220-221; causes of, 127-133 
Snedecor, G. W., 246, 254, 258, 
262, 374 | 
Spearman-Brown prophecy D 
mula in test reliability, Lar 
Split-half method, in reliability 0 
test scores, 382-383 - 
Spurious | correlation, BH 
arising from heterogeneity, 42 Е 
430; of indices, 430-431; of av 
erages, 431—432 
Stalnaker, J. L., 353 ТЕ 
Standard deviation or c, calcu id 
tion of, 58-60; calculation ar 
by Short Method, 60-62; D 
culation of, from raw Варта 
62-63; in special cases, 64-6 г 
When to use, 71; reliability ы 
194-196; estimation of tru 
value of, 398-399 ҚЫ 
Standard error, of a mean, in =." 
samples, 184; in small ватар! 52 
189; of a median, 193; of 
194; limits of accuracy in, м 
186; of О, 196; of the di ib 
ence between means, 198; of á 
difference between m 
215; of r, 297-302; table 


Py ne ПЕРИ 


INDEX 


finding the reliability of the dif- 
ferences in termsof, 190-191; of 
а percentage, 218-220 

Standard error of an obtained 
score, 392-393 

Standard error, of estimate, 320- 
321; in the interpretation of 7, 
335-337; iu partial and mul- 
tiple correlation, 422-423 

Standard scores, 156-157; сош- 
pared with T-scores, 157 

Statistic, definition of, 181 

Stead, W. H., 319, 334, 395, 435, 
454 

Student's distribution, 191-192; 
table of, 190 

Symonds, P. M., 295 


"Tables, of areas under normal 
curve, in terms of c, 114-115; 
in terms of PE, 116-117; of 
ordinates of the normal curve, 
126; of ё, 190 

Tabulation, of measures in a fre- 
quency distribution, 4-10; іп 
а correlation table, 275-278 

Taylor, H. C., 324 

Terman, L. M., 330, 393 
est items, relative difficulty of, 
141-143; analysis of, 399-402 

Test-retest method, in reliability 
of test scores, 381 
etrachoric correlation, 353-359; 
calculation of, 354-359; SE of, 
gio 308; use of diagrams in, 

thomson, G. H., 430 
horndike, Е. L., 103, 129, 183 
horndike, В. L., 331 
,"urstone, L. L., 67, 161, 857 

JDpett, L. H. C., 188 

abue, M. R., 193 

Tansmutation of measures, 160- 


487 


175; of judgments, 169-171; 
of orders of merit, 171-176 
T-scale, 149-155; definition of, 
148; advantages of, 155 
t-test, meaning of, 191-193; table 
of t, 190 


Validity, measurement of, in a 
test, 394-399; in terms of cri- 
teria, 394-396; indirect meas- 
ures of, 396; relation of, to 
reliability, 395; of test battery, 
399 

Van Voorhis, W. R., 327, 357, 359, 
363, 370, 371 

Variability, meaning of, 49-50; 
measures of, 50; coefficient of 
relative variability, 65-68; re- 
liability of measures of, 194— 
196. See also Mean deviation, 
Quartile deviation, Range, 
Standard deviation. 

Variance, in the interpretation of 
r, 331-339 

Walker, H. M., 223, 275, 288, 311 

Weldon's experiment, 109 

Wherry, R. J., 438, 451 à 

Wherry-Doolittle Test Selection 
Method, 435-451; illustration 
of, 436-451; shrinkage formula 
in, 438; regression equations 
in, 448-450; beta weights and 
multiple R, 425-426, 448-451 

Wilks, 5. S., 213 

Williams, J. H., 93 

Woo, T. L., 252 

Woodworth, R. S., 162 

Wright, Sewall, 339 

Yule, G. U., 103, 109, 183, 184, 
362, 370, 423, 430 


2-всогев. See Standard scores, 


156-157. 


23 мавт. 


2 p^ } 
Ts: их; 
{ f ag 


у 


Form No, 3. -T cag, Р 
PSY, RES,L.] 


Bureau of Educational & Psychologica 
Research Library, 


The book is to be returned Within 
the date Stamped last, 


0 JUN пата RR Mes 
arta M Et Ael 


WBGP-59/00-5130C-5M 


